workflow management tool: Topics by Science.gov

Sample records for workflow management tool

Kwf-Grid workflow management system for Earth science applications

NASA Astrophysics Data System (ADS)

Tran, V.; Hluchy, L.

2009-04-01

In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
Flexible Workflow Software enables the Management of an Increased Volume and Heterogeneity of Sensors, and evolves with the Expansion of Complex Ocean Observatory Infrastructures.

NASA Astrophysics Data System (ADS)

Tomlin, M. C.; Jenkyns, R.

2015-12-01

Ocean Networks Canada (ONC) collects data from observatories in the northeast Pacific, Salish Sea, Arctic Ocean, Atlantic Ocean, and land-based sites in British Columbia. Data are streamed, collected autonomously, or transmitted via satellite from a variety of instruments. The Software Engineering group at ONC develops and maintains Oceans 2.0, an in-house software system that acquires and archives data from sensors, and makes data available to scientists, the public, government and non-government agencies. The Oceans 2.0 workflow tool was developed by ONC to manage a large volume of tasks and processes required for instrument installation, recovery and maintenance activities. Since 2013, the workflow tool has supported 70 expeditions and grown to include 30 different workflow processes for the increasing complexity of infrastructures at ONC. The workflow tool strives to keep pace with an increasing heterogeneity of sensors, connections and environments by supporting versioning of existing workflows, and allowing the creation of new processes and tasks. Despite challenges in training and gaining mutual support from multidisciplinary teams, the workflow tool has become invaluable in project management in an innovative setting. It provides a collective place to contribute to ONC's diverse projects and expeditions and encourages more repeatable processes, while promoting interactions between the multidisciplinary teams who manage various aspects of instrument development and the data they produce. The workflow tool inspires documentation of terminologies and procedures, and effectively links to other tools at ONC such as JIRA, Alfresco and Wiki. Motivated by growing sensor schemes, modes of collecting data, archiving, and data distribution at ONC, the workflow tool ensures that infrastructure is managed completely from instrument purchase to data distribution. It integrates all areas of expertise and helps fulfill ONC's mandate to offer quality data to users.
A Tool Supporting Collaborative Data Analytics Workflow Design and Management

NASA Astrophysics Data System (ADS)

Zhang, J.; Bao, Q.; Lee, T. J.

2016-12-01

Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

PubMed Central

Stockton, David B.; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

PubMed

Stockton, David B; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
Managing and Communicating Operational Workflow: Designing and Implementing an Electronic Outpatient Whiteboard.

PubMed

Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M

2016-01-01

Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.
wft4galaxy: a workflow testing tool for galaxy.

PubMed

Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi

2017-12-01

Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
Managing and Communicating Operational Workflow

PubMed Central

Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

2016-01-01

Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

PubMed

Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

PubMed Central

Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
Tools for automated acoustic monitoring within the R package monitoR

USGS Publications Warehouse

Katz, Jonathan; Hafner, Sasha D.; Donovan, Therese

2016-01-01

The R package monitoR contains tools for managing an acoustic-monitoring program including survey metadata, template creation and manipulation, automated detection and results management. These tools are scalable for use with small projects as well as larger long-term projects and those with expansive spatial extents. Here, we describe typical workflow when using the tools in monitoR. Typical workflow utilizes a generic sequence of functions, with the option for either binary point matching or spectrogram cross-correlation detectors.
Biowep: a workflow enactment portal for bioinformatics applications.

PubMed

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-03-08

The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications

PubMed Central

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-01-01

Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
FluxCTTX: A LIMS-based tool for management and analysis of cytotoxicity assays data

PubMed Central

2015-01-01

Background Cytotoxicity assays have been used by researchers to screen for cytotoxicity in compound libraries. Researchers can either look for cytotoxic compounds or screen "hits" from initial high-throughput drug screens for unwanted cytotoxic effects before investing in their development as a pharmaceutical. These assays may be used as an alternative to animal experimentation and are becoming increasingly important in modern laboratories. However, the execution of these assays in large scale and different laboratories requires, among other things, the management of protocols, reagents, cell lines used as well as the data produced, which can be a challenge. The management of all this information is greatly improved by the utilization of computational tools to save time and guarantee quality. However, a tool that performs this task designed specifically for cytotoxicity assays is not yet available. Results In this work, we have used a workflow based LIMS -- the Flux system -- and the Together Workflow Editor as a framework to develop FluxCTTX, a tool for management of data from cytotoxicity assays performed at different laboratories. The main work is the development of a workflow, which represents all stages of the assay and has been developed and uploaded in Flux. This workflow models the activities of cytotoxicity assays performed as described in the OECD 129 Guidance Document. Conclusions FluxCTTX presents a solution for the management of the data produced by cytotoxicity assays performed at Interlaboratory comparisons. Its adoption will contribute to guarantee the quality of activities in the process of cytotoxicity tests and enforce the use of Good Laboratory Practices (GLP). Furthermore, the workflow developed is complete and can be adapted to other contexts and different tests for management of other types of data. PMID:26696462
Agile parallel bioinformatics workflow management using Pwrake.

PubMed

Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

2011-09-08

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake

PubMed Central

2011-01-01

Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
Scientific Workflow Management in Proteomics

PubMed Central

de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

2012-01-01

Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Workflow technology: the new frontier. How to overcome the barriers and join the future.

PubMed

Shefter, Susan M

2006-01-01

Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

PubMed Central

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-01-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

PubMed

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-06-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

A software tool to analyze clinical workflows from direct observations.

PubMed

Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander

2015-01-01

Observational data of clinical processes need to be managed in a convenient way, so that process information is reliable, valid and viable for further analysis. However, existing tools for allocating observations fail in systematic data collection of specific workflow recordings. We present a software tool which was developed to facilitate the analysis of clinical process observations. The tool was successfully used in the project OntoHealth, to build, store and analyze observations of diabetes routine consultations.
Improving diabetes population management efficiency with an informatics solution.

PubMed

Zai, Adrian; Grant, Richard; Andrews, Carl; Yee, Ronnie; Chueh, Henry

2007-10-11

Despite intensive resource use for diabetes management in the U.S., our care continues to fall short of evidence-based goals, partly due to system inefficiencies. Diabetes registries are increasingly being utilized as a critical tool for population level disease management by providing real-time data. Since the successful adoption of a diabetes registry depends on how well it integrates with disease management workflows, we optimized our current diabetes management workflow and designed our registry application around it.
Lessons from implementing a combined workflow-informatics system for diabetes management.

PubMed

Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C

2008-01-01

Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.
Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

NASA Astrophysics Data System (ADS)

Jones, A. S.; Horsburgh, J. S.

2014-12-01

Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The variety of environmental sensors and the multi-watershed, multi-institutional nature of the network necessitate a well-planned and efficient workflow for acquiring, managing, and sharing sensor data, which should be useful for similar large-scale and long-term networks.
Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

NASA Astrophysics Data System (ADS)

Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

2011-12-01

Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2014-12-01

The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
Contextual cloud-based service oriented architecture for clinical workflow.

PubMed

Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

2015-01-01

Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.

PubMed

Mickelson, Robin S; Unertl, Kim M; Holden, Richard J

2016-10-12

Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure

PubMed Central

2016-01-01

Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331
An architecture model for multiple disease management information systems.

PubMed

Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei

2013-04-01

Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.
Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.

PubMed

Stockton, David B; Santamaria, Fidel

2017-10-01

We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences.

PubMed

Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M

2008-08-08

Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
Vel-IO 3D: A tool for 3D velocity model construction, optimization and time-depth conversion in 3D geological modeling workflow

NASA Astrophysics Data System (ADS)

Maesano, Francesco E.; D'Ambrogi, Chiara

2017-02-01

We present Vel-IO 3D, a tool for 3D velocity model creation and time-depth conversion, as part of a workflow for 3D model building. The workflow addresses the management of large subsurface dataset, mainly seismic lines and well logs, and the construction of a 3D velocity model able to describe the variation of the velocity parameters related to strong facies and thickness variability and to high structural complexity. Although it is applicable in many geological contexts (e.g. foreland basins, large intermountain basins), it is particularly suitable in wide flat regions, where subsurface structures have no surface expression. The Vel-IO 3D tool is composed by three scripts, written in Python 2.7.11, that automate i) the 3D instantaneous velocity model building, ii) the velocity model optimization, iii) the time-depth conversion. They determine a 3D geological model that is consistent with the primary geological constraints (e.g. depth of the markers on wells). The proposed workflow and the Vel-IO 3D tool have been tested, during the EU funded Project GeoMol, by the construction of the 3D geological model of a flat region, 5700 km2 in area, located in the central part of the Po Plain. The final 3D model showed the efficiency of the workflow and Vel-IO 3D tool in the management of large amount of data both in time and depth domain. A 4 layer-cake velocity model has been applied to a several thousand (5000-13,000 m) thick succession, with 15 horizons from Triassic up to Pleistocene, complicated by a Mesozoic extensional tectonics and by buried thrusts related to Southern Alps and Northern Apennines.
MyGeoHub: A Collaborative Geospatial Research and Education Platform

NASA Astrophysics Data System (ADS)

Kalyanam, R.; Zhao, L.; Biehl, L. L.; Song, C. X.; Merwade, V.; Villoria, N.

2017-12-01

Scientific research is increasingly collaborative and globally distributed; research groups now rely on web-based scientific tools and data management systems to simplify their day-to-day collaborative workflows. However, such tools often lack seamless interfaces, requiring researchers to contend with manual data transfers, annotation and sharing. MyGeoHub is a web platform that supports out-of-the-box, seamless workflows involving data ingestion, metadata extraction, analysis, sharing and publication. MyGeoHub is built on the HUBzero cyberinfrastructure platform and adds general-purpose software building blocks (GABBs), for geospatial data management, visualization and analysis. A data management building block iData, processes geospatial files, extracting metadata for keyword and map-based search while enabling quick previews. iData is pervasive, allowing access through a web interface, scientific tools on MyGeoHub or even mobile field devices via a data service API. GABBs includes a Python map library as well as map widgets that in a few lines of code, generate complete geospatial visualization web interfaces for scientific tools. GABBs also includes powerful tools that can be used with no programming effort. The GeoBuilder tool provides an intuitive wizard for importing multi-variable, geo-located time series data (typical of sensor readings, GPS trackers) to build visualizations supporting data filtering and plotting. MyGeoHub has been used in tutorials at scientific conferences and educational activities for K-12 students. MyGeoHub is also constantly evolving; the recent addition of Jupyter and R Shiny notebook environments enable reproducible, richly interactive geospatial analyses and applications ranging from simple pre-processing to published tools. MyGeoHub is not a monolithic geospatial science gateway, instead it supports diverse needs ranging from just a feature-rich data management system, to complex scientific tools and workflows.
Git Replacement for the

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robinson, P.

2014-09-23

GRAPE is a tool for managing software project workflows for the Git version control system. It provides a suite of tools to simplify and configure branch based development, integration with a project's testing suite, and integration with the Atlassian Stash repository hosting tool.
Adaptive Workflows for Diabetes Management: Self-Management Assistant and Remote Treatment for Diabetes.

PubMed

Contreras, Iván; Kiefer, Stephan; Vehi, Josep

2017-01-01

Diabetes self-management is a crucial element for all people with diabetes and those at risk for developing the disease. Diabetic patients should be empowered to increase their self-management skills in order to prevent or delay the complications of diabetes. This work presents the proposal and first development stages of a smartphone application focused on the empowerment of the patients with diabetes. The concept of this interventional tool is based on the personalization of the user experience from an adaptive and dynamic perspective. The segmentation of the population and the dynamical treatment of user profiles among the different experience levels is the main challenge of the implementation. The self-management assistant and remote treatment for diabetes aims to develop a platform to integrate a series of innovative models and tools rigorously tested and supported by the research literature in diabetes together the use of a proved engine to manage workflows for healthcare.
Support for Taverna workflows in the VPH-Share cloud platform.

PubMed

Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F

2017-07-01

To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
A data management and publication workflow for a large-scale, heterogeneous sensor network.

PubMed

Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan

2015-06-01

It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.
Web-Accessible Scientific Workflow System for Performance Monitoring

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roelof Versteeg; Roelof Versteeg; Trevor Rowe

2006-03-01

We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
A PDA study management tool (SMT) utilizing wireless broadband and full DICOM viewing capability

NASA Astrophysics Data System (ADS)

Documet, Jorge; Liu, Brent; Zhou, Zheng; Huang, H. K.; Documet, Luis

2007-03-01

During the last 4 years IPI (Image Processing and Informatics) Laboratory has been developing a web-based Study Management Tool (SMT) application that allows Radiologists, Film librarians and PACS-related (Picture Archiving and Communication System) users to dynamically and remotely perform Query/Retrieve operations in a PACS network. The users utilizing a regular PDA (Personal Digital Assistant) can remotely query a PACS archive to distribute any study to an existing DICOM (Digital Imaging and Communications in Medicine) node. This application which has proven to be convenient to manage the Study Workflow [1, 2] has been extended to include a DICOM viewing capability in the PDA. With this new feature, users can take a quick view of DICOM images providing them mobility and convenience at the same time. In addition, we are extending this application to Metropolitan-Area Wireless Broadband Networks. This feature requires Smart Phones that are capable of working as a PDA and have access to Broadband Wireless Services. With the extended application to wireless broadband technology and the preview of DICOM images, the Study Management Tool becomes an even more powerful tool for clinical workflow management.

Towards seamless workflows in agile data science

NASA Astrophysics Data System (ADS)

Klump, J. F.; Robertson, J.

2017-12-01

Agile workflows are a response to projects with requirements that may change over time. They prioritise rapid and flexible responses to change, preferring to adapt to changes in requirements rather than predict them before a project starts. This suits the needs of research very well because research is inherently agile in its methodology. The adoption of agile methods has made collaborative data analysis much easier in a research environment fragmented across institutional data stores, HPC, personal and lab computers and more recently cloud environments. Agile workflows use tools that share a common worldview: in an agile environment, there may be more that one valid version of data, code or environment in play at any given time. All of these versions need references and identifiers. For example, a team of developers following the git-flow conventions (github.com/nvie/gitflow) may have several active branches, one for each strand of development. These workflows allow rapid and parallel iteration while maintaining identifiers pointing to individual snapshots of data and code and allowing rapid switching between strands. In contrast, the current focus of versioning in research data management is geared towards managing data for reproducibility and long-term preservation of the record of science. While both are important goals in the persistent curation domain of the institutional research data infrastructure, current tools emphasise planning over adaptation and can introduce unwanted rigidity by insisting on a single valid version or point of truth. In the collaborative curation domain of a research project, things are more fluid. However, there is no equivalent to the "versioning iso-surface" of the git protocol for the management and versioning of research data. At CSIRO we are developing concepts and tools for the agile management of software code and research data for virtual research environments, based on our experiences of actual data analytics projects in the geosciences. We use code management that allows researchers to interact with the code through tools like Jupyter Notebooks while data are held in an object store. Our aim is an architecture allowing seamless integration of code development, data management, and data processing in virtual research environments.
Pathology economic model tool: a novel approach to workflow and budget cost analysis in an anatomic pathology laboratory.

PubMed

Muirhead, David; Aoun, Patricia; Powell, Michael; Juncker, Flemming; Mollerup, Jens

2010-08-01

The need for higher efficiency, maximum quality, and faster turnaround time is a continuous focus for anatomic pathology laboratories and drives changes in work scheduling, instrumentation, and management control systems. To determine the costs of generating routine, special, and immunohistochemical microscopic slides in a large, academic anatomic pathology laboratory using a top-down approach. The Pathology Economic Model Tool was used to analyze workflow processes at The Nebraska Medical Center's anatomic pathology laboratory. Data from the analysis were used to generate complete cost estimates, which included not only materials, consumables, and instrumentation but also specific labor and overhead components for each of the laboratory's subareas. The cost data generated by the Pathology Economic Model Tool were compared with the cost estimates generated using relative value units. Despite the use of automated systems for different processes, the workflow in the laboratory was found to be relatively labor intensive. The effect of labor and overhead on per-slide costs was significantly underestimated by traditional relative-value unit calculations when compared with the Pathology Economic Model Tool. Specific workflow defects with significant contributions to the cost per slide were identified. The cost of providing routine, special, and immunohistochemical slides may be significantly underestimated by traditional methods that rely on relative value units. Furthermore, a comprehensive analysis may identify specific workflow processes requiring improvement.
Lessons from Implementing a Combined Workflow–Informatics System for Diabetes Management

PubMed Central

Zai, Adrian H.; Grant, Richard W.; Estey, Greg; Lester, William T.; Andrews, Carl T.; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C.

2008-01-01

Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease. PMID:18436907
A patient workflow management system built on guidelines.

PubMed Central

Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.

1997-01-01

To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606
Wireless remote control of clinical image workflow: using a PDA for off-site distribution and disaster recovery.

PubMed

Documet, Jorge; Liu, Brent J; Documet, Luis; Huang, H K

2006-07-01

This paper describes a picture archiving and communication system (PACS) tool based on Web technology that remotely manages medical images between a PACS archive and remote destinations. Successfully implemented in a clinical environment and also demonstrated for the past 3 years at the conferences of various organizations, including the Radiological Society of North America, this tool provides a very practical and simple way to manage a PACS, including off-site image distribution and disaster recovery. The application is robust and flexible and can be used on a standard PC workstation or a Tablet PC, but more important, it can be used with a personal digital assistant (PDA). With a PDA, the Web application becomes a powerful wireless and mobile image management tool. The application's quick and easy-to-use features allow users to perform Digital Imaging and Communications in Medicine (DICOM) queries and retrievals with a single interface, without having to worry about the underlying configuration of DICOM nodes. In addition, this frees up dedicated PACS workstations to perform their specialized roles within the PACS workflow. This tool has been used at Saint John's Health Center in Santa Monica, California, for 2 years. The average number of queries per month is 2,021, with 816 C-MOVE retrieve requests. Clinical staff members can use PDAs to manage image workflow and PACS examination distribution conveniently for off-site consultations by referring physicians and radiologists and for disaster recovery. This solution also improves radiologists' effectiveness and efficiency in health care delivery both within radiology departments and for off-site clinical coverage.
The future of scientific workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deelman, Ewa; Peterka, Tom; Altintas, Ilkay

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
Executing SADI services in Galaxy.

PubMed

Aranguren, Mikel Egaña; González, Alejandro Rodríguez; Wilkinson, Mark D

2014-01-01

In recent years Galaxy has become a popular workflow management system in bioinformatics, due to its ease of installation, use and extension. The availability of Semantic Web-oriented tools in Galaxy, however, is limited. This is also the case for Semantic Web Services such as those provided by the SADI project, i.e. services that consume and produce RDF. Here we present SADI-Galaxy, a tool generator that deploys selected SADI Services as typical Galaxy tools. SADI-Galaxy is a Galaxy tool generator: through SADI-Galaxy, any SADI-compliant service becomes a Galaxy tool that can participate in other out-standing features of Galaxy such as data storage, history, workflow creation, and publication. Galaxy can also be used to execute and combine SADI services as it does with other Galaxy tools. Finally, we have semi-automated the packing and unpacking of data into RDF such that other Galaxy tools can easily be combined with SADI services, plugging the rich SADI Semantic Web Service environment into the popular Galaxy ecosystem. SADI-Galaxy bridges the gap between Galaxy, an easy to use but "static" workflow system with a wide user-base, and SADI, a sophisticated, semantic, discovery-based framework for Web Services, thus benefiting both user communities.
Widening the adoption of workflows to include human and human-machine scientific processes

NASA Astrophysics Data System (ADS)

Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

2010-12-01

Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.
From the desktop to the grid: scalable bioinformatics via workflow conversion.

PubMed

de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver

2016-03-12

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
Data Integration Tool: From Permafrost Data Translation Research Tool to A Robust Research Application

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Strawhacker, C.; Pulsifer, P. L.; Thurmes, N.

2016-12-01

The United States National Science Foundation funded PermaData project led by the National Snow and Ice Data Center (NSIDC) with a team from the Global Terrestrial Network for Permafrost (GTN-P) aimed to improve permafrost data access and discovery. We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the GTN-P. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets. Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs. Originally it was written to capture a scientist's personal, iterative, data manipulation and quality control process of visually and programmatically iterating through inconsistent input data, examining it to find problems, adding operations to address the problems, and rerunning until the data could be translated into the GTN-P standard format. Iterative development of this tool led to a Fortran/Python hybrid then, with consideration of users, licensing, version control, packaging, and workflow, to a publically available, robust, usable application. Transitioning to Python allowed the use of open source frameworks for the workflow core and integration with a javascript graphical workflow interface. DIT is targeted to automatically handle 90% of the data processing for field scientists, modelers, and non-discipline scientists. It is available as an open source tool in GitHub packaged for a subset of Mac, Windows, and UNIX systems as a desktop application with a graphical workflow manager. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.
CMS Configuration Editor: GUI based application for user analysis job

NASA Astrophysics Data System (ADS)

de Cosa, A.

2011-12-01

We present the user interface and the software architecture of the Configuration Editor for the CMS experiment. The analysis workflow is organized in a modular way integrated within the CMS framework that organizes in a flexible way user analysis code. The Python scripting language is adopted to define the job configuration that drives the analysis workflow. It could be a challenging task for users, especially for newcomers, to develop analysis jobs managing the configuration of many required modules. For this reason a graphical tool has been conceived in order to edit and inspect configuration files. A set of common analysis tools defined in the CMS Physics Analysis Toolkit (PAT) can be steered and configured using the Config Editor. A user-defined analysis workflow can be produced starting from a standard configuration file, applying and configuring PAT tools according to the specific user requirements. CMS users can adopt this tool, the Config Editor, to create their analysis visualizing in real time which are the effects of their actions. They can visualize the structure of their configuration, look at the modules included in the workflow, inspect the dependences existing among the modules and check the data flow. They can visualize at which values parameters are set and change them according to what is required by their analysis task. The integration of common tools in the GUI needed to adopt an object-oriented structure in the Python definition of the PAT tools and the definition of a layer of abstraction from which all PAT tools inherit.
Purdue ionomics information management system. An integrated functional genomics platform.

PubMed

Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E

2007-02-01

The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
A tutorial of diverse genome analysis tools found in the CoGe web-platform using Plasmodium spp. as a model

PubMed Central

Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric

2018-01-01

Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ludascher, Bertram; Altintas, Ilkay

Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
Schedule-Aware Workflow Management Systems

NASA Astrophysics Data System (ADS)

Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.
Systematic Redaction for Neuroimage Data

PubMed Central

Matlock, Matt; Schimke, Nakeisha; Kong, Liang; Macke, Stephen; Hale, John

2013-01-01

In neuroscience, collaboration and data sharing are undermined by concerns over the management of protected health information (PHI) and personal identifying information (PII) in neuroimage datasets. The HIPAA Privacy Rule mandates measures for the preservation of subject privacy in neuroimaging studies. Unfortunately for the researcher, the management of information privacy is a burdensome task. Wide scale data sharing of neuroimages is challenging for three primary reasons: (i) A dearth of tools to systematically expunge PHI/PII from neuroimage data sets, (ii) a facility for tracking patient identities in redacted datasets has not been produced, and (iii) a sanitization workflow remains conspicuously absent. This article describes the XNAT Redaction Toolkit—an integrated redaction workflow which extends a popular neuroimage data management toolkit to remove PHI/PII from neuroimages. Quickshear defacing is also presented as a complementary technique for deidentifying the image data itself. Together, these tools improve subject privacy through systematic removal of PII/PHI. PMID:24179597
a Standardized Approach to Topographic Data Processing and Workflow Management

NASA Astrophysics Data System (ADS)

Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.

2013-12-01

An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and sharing effective topographic processing workflows.
Creating a comprehensive customer service program to help convey critical and acute results of radiology studies.

PubMed

Towbin, Alexander J; Hall, Seth; Moskovitz, Jay; Johnson, Neil D; Donnelly, Lane F

2011-01-01

Communication of acute or critical results between the radiology department and referring clinicians has been a deficiency of many radiology departments. The failure to perform or document these communications can lead to poor patient care, patient safety issues, medical-legal issues, and complaints from referring clinicians. To mitigate these factors, a communication and documentation tool was created and incorporated into our departmental customer service program. This article will describe the implementation of a comprehensive customer service program in a hospital-based radiology department. A comprehensive customer service program was created in the radiology department. Customer service representatives were hired to answer the telephone calls to the radiology reading rooms and to help convey radiology results. The radiologists, referring clinicians, and customer service representatives were then linked via a novel workflow management system. This workflow management system provided tools to help facilitate the communication needs of each group. The number of studies with results conveyed was recorded from the implementation of the workflow management system. Between the implementation of the workflow management system on August 1, 2005, and June 1, 2009, 116,844 radiology results were conveyed to the referring clinicians and documented in the system. This accounts for more than 14% of the 828,516 radiology cases performed in this time frame. We have been successful in creating a comprehensive customer service program to convey and document communication of radiology results. This program has been widely used by the ordering clinicians as well as radiologists since its inception.
An ontology-based framework for bioinformatics workflows.

PubMed

Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

2007-01-01

The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

NASA Astrophysics Data System (ADS)

Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

2017-04-01

Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).

Applying Content Management to Automated Provenance Capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.

2008-04-10

Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Report Central: quality reporting tool in an electronic health record.

PubMed

Jung, Eunice; Li, Qi; Mangalampalli, Anil; Greim, Julie; Eskin, Michael S; Housman, Dan; Isikoff, Jeremy; Abend, Aaron H; Middleton, Blackford; Einbinder, Jonathan S

2006-01-01

Quality reporting tools, integrated with ambulatory electronic health records, can help clinicians and administrators understand performance, manage populations, and improve quality. Report Central is a secure web report delivery tool built on Crystal Reports XItrade mark and ASP.NET technologies. Pilot evaluation of Report Central indicates that clinicians prefer a quality reporting tool that is integrated with our home-grown EHR to support clinical workflow.
Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA

PubMed Central

Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.

2007-01-01

The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337
ASCEM Data Brower (ASCEMDB) v0.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

ROMOSAN, ALEXANDRU

Data management tool designed for the Advanced Simulation Capability for Environmental Management (ASCEM) framework. Distinguishing features of this gateway include: (1) handling of complex geometry data, (2) advance selection mechanism, (3) state of art rendering of spatiotemporal data records, and (4) seamless integration with a distributed workflow engine.
Mobile task management tool that improves workflow of an acute general surgical service.

PubMed

Foo, Elizabeth; McDonald, Rod; Savage, Earle; Floyd, Richard; Butler, Anthony; Rumball-Smith, Alistair; Connor, Saxon

2015-10-01

Understanding and being able to measure constraints within a health system is crucial if outcomes are to be improved. Current systems lack the ability to capture decision making with regard to tasks performed within a patient journey. The aim of this study was to assess the impact of a mobile task management tool on clinical workflow within an acute general surgical service by analysing data capture and usability of the application tool. The Cortex iOS application was developed to digitize patient flow and provide real-time visibility over clinical decision making and task performance. Study outcomes measured were workflow data capture for patient and staff events. Usability was assessed using an electronic survey. There were 449 unique patient journeys tracked with a total of 3072 patient events recorded. The results repository was accessed 7792 times. The participants reported that the application sped up decision making, reduced redundancy of work and improved team communication. The mode of the estimated time the application saved participants was 5-9 min/h of work. Of the 14 respondents, nine discarded their analogue methods of tracking tasks by the end of the study period. The introduction of a mobile task management system improved the working efficiency of junior clinical staff. The application allowed capture of data not previously available to hospital systems. In the future, such data will contribute to the accurate mapping of patient journeys through the health system. © 2015 Royal Australasian College of Surgeons.
Strategic Planning for Electronic Resources Management: A Case Study at Gustavus Adolphus College

ERIC Educational Resources Information Center

Hulseberg, Anna; Monson, Sarah

2009-01-01

Electronic resources, the tools we use to manage them, and the needs and expectations of our users are constantly evolving; at the same time, the roles, responsibilities, and workflow of the library staff who manage e-resources are also in flux. Recognizing a need to be more intentional and proactive about how we manage e-resources, the…
Automation is key to managing a population's health.

PubMed

Matthews, Michael B; Hodach, Richard

2012-04-01

Online tools for automating population health management can help healthcare organizations meet their patients' needs both during and between encounters with the healthcare system. These tools can facilitate: The use of registries to track patients' health status and care gaps. Outbound messaging to notify patients when they need care. Care team management of more patients at different levels of risk. Automation of workflows related to case management and transitions of care. Online educational and mobile health interventions to engage patients in their care. Analytics programs to identify opportunities for improvement.
From chart tracking to workflow management.

PubMed Central

Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.

1994-01-01

The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051
Report Central: Quality Reporting Tool in an Electronic Health Record

PubMed Central

Jung, Eunice; Li, Qi; Mangalampalli, Anil; Greim, Julie; Eskin, Michael S.; Housman, Dan; Isikoff, Jeremy; Abend, Aaron H.; Middleton, Blackford; Einbinder, Jonathan S.

2006-01-01

Quality reporting tools, integrated with ambulatory electronic health records, can help clinicians and administrators understand performance, manage populations, and improve quality. Report Central is a secure web report delivery tool built on Crystal Reports XI™ and ASP.NET technologies. Pilot evaluation of Report Central indicates that clinicians prefer a quality reporting tool that is integrated with our home-grown EHR to support clinical workflow. PMID:17238590
Using Kepler for Tool Integration in Microarray Analysis Workflows.

PubMed

Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.
A microseismic workflow for managing induced seismicity risk as CO 2 storage projects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matzel, E.; Morency, C.; Pyle, M.

2015-10-27

It is well established that fluid injection has the potential to induce earthquakes—from microseismicity to large, damaging events—by altering state-of-stress conditions in the subsurface. While induced seismicity has not been a major operational issue for carbon storage projects to date, a seismicity hazard exists and must be carefully addressed. Two essential components of effective seismic risk management are (1) sensitive microseismic monitoring and (2) robust data interpretation tools. This report describes a novel workflow, based on advanced processing algorithms applied to microseismic data, to help improve management of seismic risk. This workflow has three main goals: (1) to improve themore » resolution and reliability of passive seismic monitoring, (2) to extract additional, valuable information from continuous waveform data that is often ignored in standard processing, and (3) to minimize the turn-around time between data collection, interpretation, and decision-making. These three objectives can allow for a better-informed and rapid response to changing subsurface conditions.« less
Workflow Management for Complex HEP Analyses

NASA Astrophysics Data System (ADS)

Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.

2017-10-01

We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.
The Perfect Neuroimaging-Genetics-Computation Storm: Collision of Petabytes of Data, Millions of Hardware Devices and Thousands of Software Tools

PubMed Central

Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Zamanyan, Alen; Torri, Federica; Macciardi, Fabio; Hobel, Sam; Moon, Seok Woo; Sung, Young Hee; Jiang, Zhiguo; Labus, Jennifer; Kurth, Florian; Ashe-McNalley, Cody; Mayer, Emeran; Vespa, Paul M.; Van Horn, John D.; Toga, Arthur W.

2013-01-01

The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data. PMID:23975276
The Protein Information Management System (PiMS): a generic tool for any structural biology research laboratory

PubMed Central

Morris, Chris; Pajon, Anne; Griffiths, Susanne L.; Daniel, Ed; Savitsky, Marc; Lin, Bill; Diprose, Jonathan M.; Wilter da Silva, Alan; Pilicheva, Katya; Troshin, Peter; van Niekerk, Johannes; Isaacs, Neil; Naismith, James; Nave, Colin; Blake, Richard; Wilson, Keith S.; Stuart, David I.; Henrick, Kim; Esnouf, Robert M.

2011-01-01

The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service. PMID:21460443
The Protein Information Management System (PiMS): a generic tool for any structural biology research laboratory.

PubMed

Morris, Chris; Pajon, Anne; Griffiths, Susanne L; Daniel, Ed; Savitsky, Marc; Lin, Bill; Diprose, Jonathan M; da Silva, Alan Wilter; Pilicheva, Katya; Troshin, Peter; van Niekerk, Johannes; Isaacs, Neil; Naismith, James; Nave, Colin; Blake, Richard; Wilson, Keith S; Stuart, David I; Henrick, Kim; Esnouf, Robert M

2011-04-01

The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service.
Workflow-Based Software Development Environment

NASA Technical Reports Server (NTRS)

Izygon, Michel E.

2013-01-01

The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment
Jflow: a workflow management system for web applications.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

2016-02-01

Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
IceProd 2 Usage Experience

NASA Astrophysics Data System (ADS)

Delventhal, D.; Schultz, D.; Diaz Velez, J. C.

2017-10-01

IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and data driven analysis. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd has recently been rewritten to increase its scaling capabilities, handle user analysis workflows together with simulation production, and facilitate the integration with 3rd party scheduling tools. IceProd 2, the second generation of IceProd, has been running in production for several months now. We share our experience setting up the system and things we’ve learned along the way.
E-Science technologies in a workflow for personalized medicine using cancer screening as a case study.

PubMed

Spjuth, Ola; Karlsson, Andreas; Clements, Mark; Humphreys, Keith; Ivansson, Emma; Dowling, Jim; Eklund, Martin; Jauhiainen, Alexandra; Czene, Kamila; Grönberg, Henrik; Sparén, Pär; Wiklund, Fredrik; Cheddad, Abbas; Pálsdóttir, Þorgerður; Rantalainen, Mattias; Abrahamsson, Linda; Laure, Erwin; Litton, Jan-Eric; Palmgren, Juni

2017-09-01

We provide an e-Science perspective on the workflow from risk factor discovery and classification of disease to evaluation of personalized intervention programs. As case studies, we use personalized prostate and breast cancer screenings. We describe an e-Science initiative in Sweden, e-Science for Cancer Prevention and Control (eCPC), which supports biomarker discovery and offers decision support for personalized intervention strategies. The generic eCPC contribution is a workflow with 4 nodes applied iteratively, and the concept of e-Science signifies systematic use of tools from the mathematical, statistical, data, and computer sciences. The eCPC workflow is illustrated through 2 case studies. For prostate cancer, an in-house personalized screening tool, the Stockholm-3 model (S3M), is presented as an alternative to prostate-specific antigen testing alone. S3M is evaluated in a trial setting and plans for rollout in the population are discussed. For breast cancer, new biomarkers based on breast density and molecular profiles are developed and the US multicenter Women Informed to Screen Depending on Measures (WISDOM) trial is referred to for evaluation. While current eCPC data management uses a traditional data warehouse model, we discuss eCPC-developed features of a coherent data integration platform. E-Science tools are a key part of an evidence-based process for personalized medicine. This paper provides a structured workflow from data and models to evaluation of new personalized intervention strategies. The importance of multidisciplinary collaboration is emphasized. Importantly, the generic concepts of the suggested eCPC workflow are transferrable to other disease domains, although each disease will require tailored solutions. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Workflow management systems in radiology

NASA Astrophysics Data System (ADS)

Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim

1998-07-01

In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.
Flexible Early Warning Systems with Workflows and Decision Tables

NASA Astrophysics Data System (ADS)

Riedel, F.; Chaves, F.; Zeiner, H.

2012-04-01

An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows are usually only suited for rigid processes. We show how improvements can be achieved by using decision tables and rule-based adaptive workflows. Decision tables have been shown to be an intuitive tool that can be used by domain experts to express rule sets that can be interpreted automatically at runtime. Adaptive workflows use a rule-based approach to increase the flexibility of workflows by providing mechanisms to adapt workflows based on context changes, human intervention and availability of services. The combination of workflows, decision tables and rule-based adaption creates a framework that opens up new possibilities for flexible and adaptable workflows, especially, for use in early warning and crisis management systems.
Dynamic Voltage Frequency Scaling Simulator for Real Workflows Energy-Aware Management in Green Cloud Computing

PubMed Central

Cotes-Ruiz, Iván Tomás; Prado, Rocío P.; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás

2017-01-01

Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique. PMID:28085932
Dynamic Voltage Frequency Scaling Simulator for Real Workflows Energy-Aware Management in Green Cloud Computing.

PubMed

Cotes-Ruiz, Iván Tomás; Prado, Rocío P; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás

2017-01-01

Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique.
Case Report: Activity Diagrams for Integrating Electronic Prescribing Tools into Clinical Workflow

PubMed Central

Johnson, Kevin B.; FitzHenry, Fern

2006-01-01

To facilitate the future implementation of an electronic prescribing system, this case study modeled prescription management processes in various primary care settings. The Vanderbilt e-prescribing design team conducted initial interviews with clinic managers, physicians and nurses, and then represented the sequences of steps carried out to complete prescriptions in activity diagrams. The diagrams covered outpatient prescribing for patients during a clinic visit and between clinic visits. Practice size, practice setting, and practice specialty type influenced the prescribing processes used. The model developed may be useful to others engaged in building or tailoring an e-prescribing system to meet the specific workflows of various clinic settings. PMID:16622168
From Provenance Standards and Tools to Queries and Actionable Provenance

NASA Astrophysics Data System (ADS)

Ludaescher, B.

2017-12-01

The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
Tools, Techniques, and Training: Results of an E-Resources Troubleshooting Survey

ERIC Educational Resources Information Center

Rathmel, Angela; Mobley, Liisa; Pennington, Buddy; Chandler, Adam

2015-01-01

A primary role of any e-resources librarian or staff is troubleshooting electronic resources (e-resources). While much progress has been made in many areas of e-resources management (ERM) to understand the ERM lifecycle and to manage workflows, troubleshooting access remains a challenge. This collaborative study is the result of the well-received…
Checklist Manifesto for Electronic Resources: Getting Ready for the Fiscal Year and Beyond

ERIC Educational Resources Information Center

England, Lenore; Fu, Li; Miller, Stephen

2011-01-01

Organization of electronic resources workflow is critical in the increasingly complicated and complex world of library management. A simple organizational tool that can be readily applied to electronic resources management (ERM) is the use of checklists. Based on the principles discussed in The Checklist Manifesto: How to Get Things Right, the…
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

PubMed Central

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

PubMed

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Business intelligence tools for radiology: creating a prototype model using open-source tools.

PubMed

Prevedello, Luciano M; Andriole, Katherine P; Hanson, Richard; Kelly, Pauline; Khorasani, Ramin

2010-04-01

Digital radiology departments could benefit from the ability to integrate and visualize data (e.g. information reflecting complex workflow states) from all of their imaging and information management systems in one composite presentation view. Leveraging data warehousing tools developed in the business world may be one way to achieve this capability. In total, the concept of managing the information available in this data repository is known as Business Intelligence or BI. This paper describes the concepts used in Business Intelligence, their importance to modern Radiology, and the steps used in the creation of a prototype model of a data warehouse for BI using open-source tools.
Research and Implementation of Key Technologies in Multi-Agent System to Support Distributed Workflow

NASA Astrophysics Data System (ADS)

Pan, Tianheng

2018-01-01

In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

NASA Astrophysics Data System (ADS)

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2017-10-01

We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.
qPortal: A platform for data-driven biomedical research.

PubMed

Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

2018-01-01

Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.
Scientific Workflows + Provenance = Better (Meta-)Data Management

NASA Astrophysics Data System (ADS)

Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

2013-12-01

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.

2013-12-01

The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
Pervasive access to images and data--the use of computing grids and mobile/wireless devices across healthcare enterprises.

PubMed

Pohjonen, Hanna; Ross, Peeter; Blickman, Johan G; Kamman, Richard

2007-01-01

Emerging technologies are transforming the workflows in healthcare enterprises. Computing grids and handheld mobile/wireless devices are providing clinicians with enterprise-wide access to all patient data and analysis tools on a pervasive basis. In this paper, emerging technologies are presented that provide computing grids and streaming-based access to image and data management functions, and system architectures that enable pervasive computing on a cost-effective basis. Finally, the implications of such technologies are investigated regarding the positive impacts on clinical workflows.
An ontological knowledge framework for adaptive medical workflow.

PubMed

Dang, Jiangbo; Hedayati, Amir; Hampel, Ken; Toklu, Candemir

2008-10-01

As emerging technologies, semantic Web and SOA (Service-Oriented Architecture) allow BPMS (Business Process Management System) to automate business processes that can be described as services, which in turn can be used to wrap existing enterprise applications. BPMS provides tools and methodologies to compose Web services that can be executed as business processes and monitored by BPM (Business Process Management) consoles. Ontologies are a formal declarative knowledge representation model. It provides a foundation upon which machine understandable knowledge can be obtained, and as a result, it makes machine intelligence possible. Healthcare systems can adopt these technologies to make them ubiquitous, adaptive, and intelligent, and then serve patients better. This paper presents an ontological knowledge framework that covers healthcare domains that a hospital encompasses-from the medical or administrative tasks, to hospital assets, medical insurances, patient records, drugs, and regulations. Therefore, our ontology makes our vision of personalized healthcare possible by capturing all necessary knowledge for a complex personalized healthcare scenario involving patient care, insurance policies, and drug prescriptions, and compliances. For example, our ontology facilitates a workflow management system to allow users, from physicians to administrative assistants, to manage, even create context-aware new medical workflows and execute them on-the-fly.
GUEST EDITOR'S INTRODUCTION: Guest Editor's introduction

NASA Astrophysics Data System (ADS)

Chrysanthis, Panos K.

1996-12-01

Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA This special issue focuses on current efforts to represent and support workflows that integrate information systems and human resources within a business or manufacturing enterprise. Workflows may also be viewed as an emerging computational paradigm for effective structuring of cooperative applications involving human users and access to diverse data types not necessarily maintained by traditional database management systems. A workflow is an automated organizational process (also called business process) which consists of a set of activities or tasks that need to be executed in a particular controlled order over a combination of heterogeneous database systems and legacy systems. Within workflows, tasks are performed cooperatively by either human or computational agents in accordance with their roles in the organizational hierarchy. The challenge in facilitating the implementation of workflows lies in developing efficient workflow management systems. A workflow management system (also called workflow server, workflow engine or workflow enactment system) provides the necessary interfaces for coordination and communication among human and computational agents to execute the tasks involved in a workflow and controls the execution orderings of tasks as well as the flow of data that these tasks manipulate. That is, the workflow management system is responsible for correctly and reliably supporting the specification, execution, and monitoring of workflows. The six papers selected (out of the twenty-seven submitted for this special issue of Distributed Systems Engineering) address different aspects of these three functional components of a workflow management system. In the first paper, `Correctness issues in workflow management', Kamath and Ramamritham discuss the important issue of correctness in workflow management that constitutes a prerequisite for the use of workflows in the automation of the critical organizational/business processes. In particular, this paper examines the issues of execution atomicity and failure atomicity, differentiating between correctness requirements of system failures and logical failures, and surveys techniques that can be used to ensure data consistency in workflow management systems. While the first paper is concerned with correctness assuming transactional workflows in which selective transactional properties are associated with individual tasks or the entire workflow, the second paper, `Scheduling workflows by enforcing intertask dependencies' by Attie et al, assumes that the tasks can be either transactions or other activities involving legacy systems. This second paper describes the modelling and specification of conditions involving events and dependencies among tasks within a workflow using temporal logic and finite state automata. It also presents a scheduling algorithm that enforces all stated dependencies by executing at any given time only those events that are allowed by all the dependency automata and in an order as specified by the dependencies. In any system with decentralized control, there is a need to effectively cope with the tension that exists between autonomy and consistency requirements. In `A three-level atomicity model for decentralized workflow management systems', Ben-Shaul and Heineman focus on the specific requirement of enforcing failure atomicity in decentralized, autonomous and interacting workflow management systems. Their paper describes a model in which each workflow manager must be able to specify the sequence of tasks that comprise an atomic unit for the purposes of correctness, and the degrees of local and global atomicity for the purpose of cooperation with other workflow managers. The paper also discusses a realization of this model in which treaties and summits provide an agreement mechanism, while underlying transaction managers are responsible for maintaining failure atomicity. The fourth and fifth papers are experience papers describing a workflow management system and a large scale workflow application, respectively. Schill and Mittasch, in `Workflow management systems on top of OSF DCE and OMG CORBA', describe a decentralized workflow management system and discuss its implementation using two standardized middleware platforms, namely, OSF DCE and OMG CORBA. The system supports a new approach to workflow management, introducing several new concepts such as data type management for integrating various types of data and quality of service for various services provided by servers. A problem common to both database applications and workflows is the handling of missing and incomplete information. This is particularly pervasive in an `electronic market' with a huge number of retail outlets producing and exchanging volumes of data, the application discussed in `Information flow in the DAMA project beyond database managers: information flow managers'. Motivated by the need for a method that allows a task to proceed in a timely manner if not all data produced by other tasks are available by its deadline, Russell et al propose an architectural framework and a language that can be used to detect, approximate and, later on, to adjust missing data if necessary. The final paper, `The evolution towards flexible workflow systems' by Nutt, is complementary to the other papers and is a survey of issues and of work related to both workflow and computer supported collaborative work (CSCW) areas. In particular, the paper provides a model and a categorization of the dimensions which workflow management and CSCW systems share. Besides summarizing the recent advancements towards efficient workflow management, the papers in this special issue suggest areas open to investigation and it is our hope that they will also provide the stimulus for further research and development in the area of workflow management systems.
VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.

PubMed

Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W

2018-04-12

RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.

Managing the CMS Data and Monte Carlo Processing during LHC Run 2

NASA Astrophysics Data System (ADS)

Wissing, C.; CMS Collaboration

2017-10-01

In order to cope with the challenges expected during the LHC Run 2 CMS put in a number of enhancements into the main software packages and the tools used for centrally managed processing. In the presentation we will highlight these improvements that allow CMS to deal with the increased trigger output rate, the increased pileup and the evolution in computing technology. The overall system aims at high flexibility, improved operational flexibility and largely automated procedures. The tight coupling of workflow classes to types of sites has been drastically relaxed. Reliable and high-performing networking between most of the computing sites and the successful deployment of a data-federation allow the execution of workflows using remote data access. That required the development of a largely automatized system to assign workflows and to handle necessary pre-staging of data. Another step towards flexibility has been the introduction of one large global HTCondor Pool for all types of processing workflows and analysis jobs. Besides classical Grid resources also some opportunistic resources as well as Cloud resources have been integrated into that Pool, which gives reach to more than 200k CPU cores.
A Knowledge Management Technology Architecture for Educational Research Organisations: Scaffolding Research Projects and Workflow Processing

ERIC Educational Resources Information Center

Muthukumar; Hedberg, John G.

2005-01-01

There is growing recognition that the economic climate of the world is shifting towards a knowledge-based economy where knowledge will be cherished as the most prized asset. In this regard, technology can be leveraged as a useful tool in effectually managing the knowledge capital of an organisation. Although several research studies have advanced…
A knowledge Management Technology Architecture for Educational Research Organisations: Scaffolding Research Projects and Workflow Processing

ERIC Educational Resources Information Center

Muthukumar; Hedberg, John G.

2005-01-01

There is growing recognition that the economic climate of the world is shifting towards a knowledge-based economy where knowledge will be cherished as the most prized asset. In this regard, technology can be leveraged as a useful tool in effectually managing the knowledge capital of an organisation. Although several research studies have advanced…
Cyberinfrastructure for End-to-End Environmental Explorations

NASA Astrophysics Data System (ADS)

Merwade, V.; Kumar, S.; Song, C.; Zhao, L.; Govindaraju, R.; Niyogi, D.

2007-12-01

The design and implementation of a cyberinfrastructure for End-to-End Environmental Exploration (C4E4) is presented. The C4E4 framework addresses the need for an integrated data/computation platform for studying broad environmental impacts by combining heterogeneous data resources with state-of-the-art modeling and visualization tools. With Purdue being a TeraGrid Resource Provider, C4E4 builds on top of the Purdue TeraGrid data management system and Grid resources, and integrates them through a service-oriented workflow system. It allows researchers to construct environmental workflows for data discovery, access, transformation, modeling, and visualization. Using the C4E4 framework, we have implemented an end-to-end SWAT simulation and analysis workflow that connects our TeraGrid data and computation resources. It enables researchers to conduct comprehensive studies on the impact of land management practices in the St. Joseph watershed using data from various sources in hydrologic, atmospheric, agricultural, and other related disciplines.
A standard-enabled workflow for synthetic biology.

PubMed

Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

2017-06-15

A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE PAGES

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...

2016-07-21

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Integrated workflows for spiking neuronal network simulations

PubMed Central

Antolík, Ján; Davison, Andrew P.

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Integrated workflows for spiking neuronal network simulations.

PubMed

Antolík, Ján; Davison, Andrew P

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

2011-07-04

A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less
Parallel workflow tools to facilitate human brain MRI post-processing

PubMed Central

Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

2015-01-01

Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
A practical workflow for making anatomical atlases for biological research.

PubMed

Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles

2012-01-01

The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.
A characterization of workflow management systems for extreme-scale applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications

DOE PAGES

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

2017-02-16

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
SMITH: a LIMS for handling next-generation sequencing workflows

PubMed Central

2014-01-01

Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
SMITH: a LIMS for handling next-generation sequencing workflows.

PubMed

Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko

2014-01-01

Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-11-28

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-03-01

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Information Management Workflow and Tools Enabling Multiscale Modeling Within ICME Paradigm

NASA Technical Reports Server (NTRS)

Arnold, Steven M.; Bednarcyk, Brett A.; Austin, Nic; Terentjev, Igor; Cebon, Dave; Marsden, Will

2016-01-01

With the increased emphasis on reducing the cost and time to market of new materials, the need for analytical tools that enable the virtual design and optimization of materials throughout their processing - internal structure - property - performance envelope, along with the capturing and storing of the associated material and model information across its lifecycle, has become critical. This need is also fueled by the demands for higher efficiency in material testing; consistency, quality and traceability of data; product design; engineering analysis; as well as control of access to proprietary or sensitive information. Fortunately, material information management systems and physics-based multiscale modeling methods have kept pace with the growing user demands. Herein, recent efforts to establish workflow for and demonstrate a unique set of web application tools for linking NASA GRC's Integrated Computational Materials Engineering (ICME) Granta MI database schema and NASA GRC's Integrated multiscale Micromechanics Analysis Code (ImMAC) software toolset are presented. The goal is to enable seamless coupling between both test data and simulation data, which is captured and tracked automatically within Granta MI®, with full model pedigree information. These tools, and this type of linkage, are foundational to realizing the full potential of ICME, in which materials processing, microstructure, properties, and performance are coupled to enable application-driven design and optimization of materials and structures.
CERES AuTomAted job Loading SYSTem (CATALYST): An automated workflow manager for satellite data production

NASA Astrophysics Data System (ADS)

Gleason, J. L.; Hillyer, T. N.; Wilkins, J.

2012-12-01

The CERES Science Team integrates data from 5 CERES instruments onboard the Terra, Aqua and NPP missions. The processing chain fuses CERES observations with data from 19 other unique sources. The addition of CERES Flight Model 5 (FM5) onboard NPP, coupled with ground processing system upgrades further emphasizes the need for an automated job-submission utility to manage multiple processing streams concurrently. The operator-driven, legacy-processing approach relied on manually staging data from magnetic tape to limited spinning disk attached to a shared memory architecture system. The migration of CERES production code to a distributed, cluster computing environment with approximately one petabyte of spinning disk containing all precursor input data products facilitates the development of a CERES-specific, automated workflow manager. In the cluster environment, I/O is the primary system resource in contention across jobs. Therefore, system load can be maximized with a throttling workload manager. This poster discusses a Java and Perl implementation of an automated job management tool tailored for CERES processing.

Installation and Testing of ITER Integrated Modeling and Analysis Suite (IMAS) on DIII-D

NASA Astrophysics Data System (ADS)

Lao, L.; Kostuk, M.; Meneghini, O.; Smith, S.; Staebler, G.; Kalling, R.; Pinches, S.

2017-10-01

A critical objective of the ITER Integrated Modeling Program is the development of IMAS to support ITER plasma operation and research activities. An IMAS framework has been established based on the earlier work carried out within the EU. It consists of a physics data model and a workflow engine. The data model is capable of representing both simulation and experimental data and is applicable to ITER and other devices. IMAS has been successfully installed on a local DIII-D server using a flexible installer capable of managing the core data access tools (Access Layer and Data Dictionary) and optionally the Kepler workflow engine and coupling tools. A general adaptor for OMFIT (a workflow engine) is being built for adaptation of any analysis code to IMAS using a new IMAS universal access layer (UAL) interface developed from an existing OMFIT EU Integrated Tokamak Modeling UAL. Ongoing work includes development of a general adaptor for EFIT and TGLF based on this new UAL that can be readily extended for other physics codes within OMFIT. Work supported by US DOE under DE-FC02-04ER54698.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

PubMed

Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

2016-01-01

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Leveraging workflow control patterns in the domain of clinical practice guidelines.

PubMed

Kaiser, Katharina; Marcos, Mar

2016-02-10

Clinical practice guidelines (CPGs) include recommendations describing appropriate care for the management of patients with a specific clinical condition. A number of representation languages have been developed to support executable CPGs, with associated authoring/editing tools. Even with tool assistance, authoring of CPG models is a labor-intensive task. We aim at facilitating the early stages of CPG modeling task. In this context, we propose to support the authoring of CPG models based on a set of suitable procedural patterns described in an implementation-independent notation that can be then semi-automatically transformed into one of the alternative executable CPG languages. We have started with the workflow control patterns which have been identified in the fields of workflow systems and business process management. We have analyzed the suitability of these patterns by means of a qualitative analysis of CPG texts. Following our analysis we have implemented a selection of workflow patterns in the Asbru and PROforma CPG languages. As implementation-independent notation for the description of patterns we have chosen BPMN 2.0. Finally, we have developed XSLT transformations to convert the BPMN 2.0 version of the patterns into the Asbru and PROforma languages. We showed that although a significant number of workflow control patterns are suitable to describe CPG procedural knowledge, not all of them are applicable in the context of CPGs due to their focus on single-patient care. Moreover, CPGs may require additional patterns not included in the set of workflow control patterns. We also showed that nearly all the CPG-suitable patterns can be conveniently implemented in the Asbru and PROforma languages. Finally, we demonstrated that individual patterns can be semi-automatically transformed from a process specification in BPMN 2.0 to executable implementations in these languages. We propose a pattern and transformation-based approach for the development of CPG models. Such an approach can form the basis of a valid framework for the authoring of CPG models. The identification of adequate patterns and the implementation of transformations to convert patterns from a process specification into different executable implementations are the first necessary steps for our approach.
Designing Health Information Technology Tools to Prevent Gaps in Public Health Insurance.

PubMed

Hall, Jennifer D; Harding, Rose L; DeVoe, Jennifer E; Gold, Rachel; Angier, Heather; Sumic, Aleksandra; Nelson, Christine A; Likumahuwa-Ackman, Sonja; Cohen, Deborah J

2017-06-23

Changes in health insurance policies have increased coverage opportunities, but enrollees are required to annually reapply for benefits which, if not managed appropriately, can lead to insurance gaps. Electronic health records (EHRs) can automate processes for assisting patients with health insurance enrollment and re-enrollment. We describe community health centers' (CHC) workflow, documentation, and tracking needs for assisting families with insurance application processes, and the health information technology (IT) tool components that were developed to meet those needs. We conducted a qualitative study using semi-structured interviews and observation of clinic operations and insurance application assistance processes. Data were analyzed using a grounded theory approach. We diagramed workflows and shared information with a team of developers who built the EHR-based tools. Four steps to the insurance assistance workflow were common among CHCs: 1) Identifying patients for public health insurance application assistance; 2) Completing and submitting the public health insurance application when clinic staff met with patients to collect requisite information and helped them apply for benefits; 3) Tracking public health insurance approval to monitor for decisions; and 4) assisting with annual health insurance reapplication. We developed EHR-based tools to support clinical staff with each of these steps. CHCs are uniquely positioned to help patients and families with public health insurance applications. CHCs have invested in staff to assist patients with insurance applications and help prevent coverage gaps. To best assist patients and to foster efficiency, EHR based insurance tools need comprehensive, timely, and accurate health insurance information.
Curriculum Online Review System: Proposing Curriculum with Collaboration

ERIC Educational Resources Information Center

Rhinehart, Marilyn; Barlow, Rhonda; Shafer, Stu; Hassur, Debby

2009-01-01

The Curriculum Online Review System (CORS) at Johnson County Community College (JCCC) uses SharePoint as a Web platform for the JCCC Curriculum Proposals Process. The CORS application manages proposals throughout the approval process using collaboration tools and workflows to notify all stakeholders. This innovative new program has changed the way…
Akuna: An Open Source User Environment for Managing Subsurface Simulation Workflows

NASA Astrophysics Data System (ADS)

Freedman, V. L.; Agarwal, D.; Bensema, K.; Finsterle, S.; Gable, C. W.; Keating, E. H.; Krishnan, H.; Lansing, C.; Moeglein, W.; Pau, G. S. H.; Porter, E.; Scheibe, T. D.

2014-12-01

The U.S. Department of Energy (DOE) is investing in development of a numerical modeling toolset called ASCEM (Advanced Simulation Capability for Environmental Management) to support modeling analyses at legacy waste sites. ASCEM is an open source and modular computing framework that incorporates new advances and tools for predicting contaminant fate and transport in natural and engineered systems. The ASCEM toolset includes both a Platform with Integrated Toolsets (called Akuna) and a High-Performance Computing multi-process simulator (called Amanzi). The focus of this presentation is on Akuna, an open-source user environment that manages subsurface simulation workflows and associated data and metadata. In this presentation, key elements of Akuna are demonstrated, which includes toolsets for model setup, database management, sensitivity analysis, parameter estimation, uncertainty quantification, and visualization of both model setup and simulation results. A key component of the workflow is in the automated job launching and monitoring capabilities, which allow a user to submit and monitor simulation runs on high-performance, parallel computers. Visualization of large outputs can also be performed without moving data back to local resources. These capabilities make high-performance computing accessible to the users who might not be familiar with batch queue systems and usage protocols on different supercomputers and clusters.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.

PubMed

Jimenez, Rafael C; Corpas, Manuel

2013-01-01

Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

PubMed

Winkler, Robert

2015-01-01

In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.
A Web application for the management of clinical workflow in image-guided and adaptive proton therapy for prostate cancer treatments.

PubMed

Yeung, Daniel; Boes, Peter; Ho, Meng Wei; Li, Zuofeng

2015-05-08

Image-guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X-rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post-treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state-of-the-art Web technologies, a domain model closely matching the workflow, a database-supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model-View-Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client-side technologies, such as jQuery, jQuery Plug-ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process.
Real-Time System for Water Modeling and Management

NASA Astrophysics Data System (ADS)

Lee, J.; Zhao, T.; David, C. H.; Minsker, B.

2012-12-01

Working closely with the Texas Commission on Environmental Quality (TCEQ) and the University of Texas at Austin (UT-Austin), we are developing a real-time system for water modeling and management using advanced cyberinfrastructure, data integration and geospatial visualization, and numerical modeling. The state of Texas suffered a severe drought in 2011 that cost the state $7.62 billion in agricultural losses (crops and livestock). Devastating situations such as this could potentially be avoided with better water modeling and management strategies that incorporate state of the art simulation and digital data integration. The goal of the project is to prototype a near-real-time decision support system for river modeling and management in Texas that can serve as a national and international model to promote more sustainable and resilient water systems. The system uses National Weather Service current and predicted precipitation data as input to the Noah-MP Land Surface model, which forecasts runoff, soil moisture, evapotranspiration, and water table levels given land surface features. These results are then used by a river model called RAPID, along with an error model currently under development at UT-Austin, to forecast stream flows in the rivers. Model forecasts are visualized as a Web application for TCEQ decision makers, who issue water diversion (withdrawal) permits and any needed drought restrictions; permit holders; and reservoir operation managers. Users will be able to adjust model parameters to predict the impacts of alternative curtailment scenarios or weather forecasts. A real-time optimization system under development will help TCEQ to identify optimal curtailment strategies to minimize impacts on permit holders and protect health and safety. To develop the system we have implemented RAPID as a remotely-executed modeling service using the Cyberintegrator workflow system with input data downloaded from the North American Land Data Assimilation System. The Cyberintegrator workflow system provides RESTful web services for users to provide inputs, execute workflows, and retrieve outputs. Along with REST endpoints, PAW (Publishable Active Workflows) provides the web user interface toolkit for us to develop web applications with scientific workflows. The prototype web application is built on top of workflows with PAW, so that users will have a user-friendly web environment to provide input parameters, execute the model, and visualize/retrieve the results using geospatial mapping tools. In future work the optimization model will be developed and integrated into the workflow.; Real-Time System for Water Modeling and Management
Radiology information system: a workflow-based approach.

PubMed

Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P

2009-09-01

Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.

PubMed

Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton

2013-01-01

The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
SWS: accessing SRS sites contents through Web Services.

PubMed

Romano, Paolo; Marra, Domenico

2008-03-26

Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
Progress in digital color workflow understanding in the International Color Consortium (ICC) Workflow WG

NASA Astrophysics Data System (ADS)

McCarthy, Ann

2006-01-01

The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.
Creating Access Points to Instrument-Based Atmospheric Data: Perspectives from the ARM Metadata Manager

NASA Astrophysics Data System (ADS)

Troyan, D.

2016-12-01

The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
Scaling Agile Infrastructure to People

NASA Astrophysics Data System (ADS)

Jones, B.; McCance, G.; Traylen, S.; Barrientos Arias, N.

2015-12-01

When CERN migrated its infrastructure away from homegrown fabric management tools to emerging industry-standard open-source solutions, the immediate technical challenges and motivation were clear. The move to a multi-site Cloud Computing model meant that the tool chains that were growing around this ecosystem would be a good choice, the challenge was to leverage them. The use of open-source tools brings challenges other than merely how to deploy them. Homegrown software, for all the deficiencies identified at the outset of the project, has the benefit of growing with the organization. This paper will examine what challenges there were in adapting open-source tools to the needs of the organization, particularly in the areas of multi-group development and security. Additionally, the increase in scale of the plant required changes to how Change Management was organized and managed. Continuous Integration techniques are used in order to manage the rate of change across multiple groups, and the tools and workflow for this will be examined.
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.

PubMed

Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert

2017-05-15

Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.
[Implementation of modern operating room management -- experiences made at an university hospital].

PubMed

Hensel, M; Wauer, H; Bloch, A; Volk, T; Kox, W J; Spies, C

2005-07-01

Caused by structural changes in health care the general need for cost control is evident for all hospitals. As operating room is one of the most cost-intensive sectors in a hospital, optimisation of workflow processes in this area is of particular interest for health care providers. While modern operating room management is established in several clinics yet, others are less prepared for economic challenges. Therefore, the operating room statute of the Charité university hospital useful for other hospitals to develop an own concept is presented. In addition, experiences made with implementation of new management structures are described and results obtained over the last 5 years are reported. Whereas the total number of operation procedures increased by 15 %, the operating room utilization increased more markedly in terms of time and cases. Summarizing the results, central operating room management has been proved to be an effective tool to increase the efficiency of workflow processes in the operating room.
Automation of Educational Tasks for Academic Radiology.

PubMed

Lamar, David L; Richardson, Michael L; Carlson, Blake

2016-07-01

The process of education involves a variety of repetitious tasks. We believe that appropriate computer tools can automate many of these chores, and allow both educators and their students to devote a lot more of their time to actual teaching and learning. This paper details tools that we have used to automate a broad range of academic radiology-specific tasks on Mac OS X, iOS, and Windows platforms. Some of the tools we describe here require little expertise or time to use; others require some basic knowledge of computer programming. We used TextExpander (Mac, iOS) and AutoHotKey (Win) for automated generation of text files, such as resident performance reviews and radiology interpretations. Custom statistical calculations were performed using TextExpander and the Python programming language. A workflow for automated note-taking was developed using Evernote (Mac, iOS, Win) and Hazel (Mac). Automated resident procedure logging was accomplished using Editorial (iOS) and Python. We created three variants of a teaching session logger using Drafts (iOS) and Pythonista (iOS). Editorial and Drafts were used to create flashcards for knowledge review. We developed a mobile reference management system for iOS using Editorial. We used the Workflow app (iOS) to automatically generate a text message reminder for daily conferences. Finally, we developed two separate automated workflows-one with Evernote (Mac, iOS, Win) and one with Python (Mac, Win)-that generate simple automated teaching file collections. We have beta-tested these workflows, techniques, and scripts on several of our fellow radiologists. All of them expressed enthusiasm for these tools and were able to use one or more of them to automate their own educational activities. Appropriate computer tools can automate many educational tasks, and thereby allow both educators and their students to devote a lot more of their time to actual teaching and learning. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

PubMed

Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

2005-04-07

Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.

A Virtual Environment for Process Management. A Step by Step Implementation

ERIC Educational Resources Information Center

Mayer, Sergio Valenzuela

2003-01-01

In this paper it is presented a virtual organizational environment, conceived with the integration of three computer programs: a manufacturing simulation package, an automation of businesses processes (workflows), and business intelligence (Balanced Scorecard) software. It was created as a supporting tool for teaching IE, its purpose is to give…
Critical care physician cognitive task analysis: an exploratory study

PubMed Central

Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter

2009-01-01

Introduction For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. Methods After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Results Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. Conclusions CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes. PMID:19265517
Critical care physician cognitive task analysis: an exploratory study.

PubMed

Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter

2009-01-01

For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes.
A three-level atomicity model for decentralized workflow management systems

NASA Astrophysics Data System (ADS)

Ben-Shaul, Israel Z.; Heineman, George T.

1996-12-01

A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.
Generic worklist handler for workflow-enabled products

NASA Astrophysics Data System (ADS)

Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas

1999-07-01

Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

PubMed Central

Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

2016-01-01

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627
Communication strategies and volunteer management for the IAU-OAD

NASA Astrophysics Data System (ADS)

Sankatsing Nava, Tibisay

2015-08-01

The IAU Office of Astronomy for Development will be developing a new communication strategy to promote its projects in a way that is relevant to stakeholders and the general public. Ideas include a magazine featuring best practices within the field of astronomy for development and setting up a workflow of communication that integrates the different outputs of the office and effectively uses the information collection tools developed by OAD team members.To accomplish these tasks the OAD will also develop a community management strategy with existing tools to effectively harness the skills of OAD volunteers for communication purposes. This talk will discuss the new communication strategy of the OAD as well the expanded community management plans.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

PubMed Central

Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

2013-01-01

The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
An Auto-management Thesis Program WebMIS Based on Workflow

NASA Astrophysics Data System (ADS)

Chang, Li; Jie, Shi; Weibo, Zhong

An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.
A big data approach for climate change indicators processing in the CLIP-C project

NASA Astrophysics Data System (ADS)

D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni

2016-04-01

Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.
Echo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harvey, Dustin Yewell

This document is a white paper marketing proposal for Echo™ is a data analysis platform designed for efficient, robust, and scalable creation and execution of complex workflows. Echo’s analysis management system refers to the ability to track, understand, and reproduce workflows used for arriving at results and decisions. Echo improves on traditional scripted data analysis in MATLAB, Python, R, and other languages to allow analysts to make better use of their time. Additionally, the Echo platform provides a powerful data management and curation solution allowing analysts to quickly find, access, and consume datasets. After two years of development and amore » first release in early 2016, Echo is now available for use with many data types in a wide range of application domains. Echo provides tools that allow users to focus on data analysis and decisions with confidence that results are reported accurately.« less
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo

2011-07-20

Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Clinic Workflow Simulations using Secondary EHR Data

PubMed Central

Hribar, Michelle R.; Biermann, David; Read-Brown, Sarah; Reznick, Leah; Lombardi, Lorinna; Parikh, Mansi; Chamberlain, Winston; Yackel, Thomas R.; Chiang, Michael F.

2016-01-01

Clinicians today face increased patient loads, decreased reimbursements and potential negative productivity impacts of using electronic health records (EHR), but have little guidance on how to improve clinic efficiency. Discrete event simulation models are powerful tools for evaluating clinical workflow and improving efficiency, particularly when they are built from secondary EHR timing data. The purpose of this study is to demonstrate that these simulation models can be used for resource allocation decision making as well as for evaluating novel scheduling strategies in outpatient ophthalmology clinics. Key findings from this study are that: 1) secondary use of EHR timestamp data in simulation models represents clinic workflow, 2) simulations provide insight into the best allocation of resources in a clinic, 3) simulations provide critical information for schedule creation and decision making by clinic managers, and 4) simulation models built from EHR data are potentially generalizable. PMID:28269861
Modeling Complex Workflow in Molecular Diagnostics

PubMed Central

Gomah, Mohamed E.; Turley, James P.; Lu, Huimin; Jones, Dan

2010-01-01

One of the hurdles to achieving personalized medicine has been implementing the laboratory processes for performing and reporting complex molecular tests. The rapidly changing test rosters and complex analysis platforms in molecular diagnostics have meant that many clinical laboratories still use labor-intensive manual processing and testing without the level of automation seen in high-volume chemistry and hematology testing. We provide here a discussion of design requirements and the results of implementation of a suite of lab management tools that incorporate the many elements required for use of molecular diagnostics in personalized medicine, particularly in cancer. These applications provide the functionality required for sample accessioning and tracking, material generation, and testing that are particular to the evolving needs of individualized molecular diagnostics. On implementation, the applications described here resulted in improvements in the turn-around time for reporting of more complex molecular test sets, and significant changes in the workflow. Therefore, careful mapping of workflow can permit design of software applications that simplify even the complex demands of specialized molecular testing. By incorporating design features for order review, software tools can permit a more personalized approach to sample handling and test selection without compromising efficiency. PMID:20007844
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

DOE PAGES

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...

2015-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

PubMed Central

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha

2014-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

NASA Astrophysics Data System (ADS)

Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

2017-12-01

This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.
Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline

PubMed Central

Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur

2010-01-01

Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408
A Web application for the management of clinical workflow in image‐guided and adaptive proton therapy for prostate cancer treatments

PubMed Central

Boes, Peter; Ho, Meng Wei; Li, Zuofeng

2015-01-01

Image‐guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X‐rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post‐treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state‐of‐the‐art Web technologies, a domain model closely matching the workflow, a database‐supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model‐View‐Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client‐side technologies, such as jQuery, jQuery Plug‐ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process. PACS number: 87 PMID:26103504
Embracing the Archives: How NPR Librarians Turned Their Collection into a Workflow Tool

ERIC Educational Resources Information Center

Sin, Lauren; Daugert, Katie

2013-01-01

Several years ago, National Public Radio (NPR) librarians began developing a new content management system (CMS). It was intended to offer desktop access for all NPR-produced content, including transcripts, audio, and metadata. Fast-forward to 2011, and their shiny, new database, Artemis, was ready for debut. Their next challenge: to teach a staff…

DataUp: Helping manage and archive data within the researcher's workflow

NASA Astrophysics Data System (ADS)

Strasser, C.

2012-12-01

There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are lacks of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. We have developed an open-source add-in for Excel and an open source web application intended to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. The researcher does not need a prior relationship with a data repository to use DataUp; the newly implemented ONEShare repository, a DataONE member node, is available for any researcher to archive and share their data. By meeting researchers where they already work, in spreadsheets, DataUp becomes part of the researcher's workflow and data management and sharing becomes easier. Future enhancement of DataUp will rely on members of the community adopting and adapting the DataUp tools to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between Microsoft Research Connections, the University of California's California Digital Library, the Gordon and Betty Moore Foundation, and DataONE.
Integration and Value of Earth Observations Data for Water Management Decision-Making in the Western U.S.

NASA Astrophysics Data System (ADS)

Larsen, S. G.; Willardson, T.

2017-12-01

Some exciting new science and tools are under development for water management decision-making in the Western U.S. This session will highlight a number of examples where remotely-sensed observation data has been directly beneficial to water resource stakeholders, and discuss the steps needed between receipt of the data and their delivery as a finished data product or tool. We will explore case studies of how NASA scientists and researchers have worked with together with western state water agencies and other stakeholders as a team, to develop and interpret remotely-sensed data observations, implement easy-to-use software and tools, train team-members on their operation, and transition those tools into the insititution's workflows. The benefits of integrating these tools into stakeholder, agency, and end-user operations can be seen on-the-ground, when water is optimally managed for the decision-maker's objectives. These cases also point to the importance of building relationships and conduits for communication between researchers and their institutional counterparts.
Integration and Value of Earth Observations Data for Water Management Decision-Making in the Western U.S.

NASA Astrophysics Data System (ADS)

Larsen, S. G.; Willardson, T.

2016-12-01

Some exciting new science and tools are under development for water management decision-making in the Western U.S. This session will highlight a number of examples where remotely-sensed observation data has been directly beneficial to water resource stakeholders, and discuss the steps needed between receipt of the data and their delivery as a finished data product or tool. We will explore case studies of how NASA scientists and researchers have worked with together with western state water agencies and other stakeholders as a team, to develop and interpret remotely-sensed data observations, implement easy-to-use software and tools, train team-members on their operation, and transition those tools into the insititution's workflows. The benefits of integrating these tools into stakeholder, agency, and end-user operations can be seen on-the-ground, when water is optimally managed for the decision-maker's objectives. These cases also point to the importance of building relationships and conduits for communication between researchers and their institutional counterparts.
High-volume workflow management in the ITN/FBI system

NASA Astrophysics Data System (ADS)

Paulson, Thomas L.

1997-02-01

The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.
Towards a Unified Architecture for Data-Intensive Seismology in VERCE

NASA Astrophysics Data System (ADS)

Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.

2013-12-01

Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
A UIMA wrapper for the NCBO annotator.

PubMed

Roeder, Christophe; Jonquet, Clement; Shah, Nigam H; Baumgartner, William A; Verspoor, Karin; Hunter, Lawrence

2010-07-15

The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows. This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
Primary care physicians' perspectives on computer-based health risk assessment tools for chronic diseases: a mixed methods study.

PubMed

Voruganti, Teja R; O'Brien, Mary Ann; Straus, Sharon E; McLaughlin, John R; Grunfeld, Eva

2015-09-24

Health risk assessment tools compute an individual's risk of developing a disease. Routine use of such tools by primary care physicians (PCPs) is potentially useful in chronic disease prevention. We sought physicians' awareness and perceptions of the usefulness, usability and feasibility of performing assessments with computer-based risk assessment tools in primary care settings. Focus groups and usability testing with a computer-based risk assessment tool were conducted with PCPs from both university-affiliated and community-based practices. Analysis was derived from grounded theory methodology. PCPs (n = 30) were aware of several risk assessment tools although only select tools were used routinely. The decision to use a tool depended on how use impacted practice workflow and whether the tool had credibility. Participants felt that embedding tools in the electronic medical records (EMRs) system might allow for health information from the medical record to auto-populate into the tool. User comprehension of risk could also be improved with computer-based interfaces that present risk in different formats. In this study, PCPs chose to use certain tools more regularly because of usability and credibility. Despite there being differences in the particular tools a clinical practice used, there was general appreciation for the usefulness of tools for different clinical situations. Participants characterised particular features of an ideal tool, feeling strongly that embedding risk assessment tools in the EMR would maximise accessibility and use of the tool for chronic disease management. However, appropriate practice workflow integration and features that facilitate patient understanding at point-of-care are also essential.
Maestro Workflow Conductor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Di Natale, Francesco

2017-06-01

MaestroWF is a Python tool and software package for loading YAML study specifications that represents a simulation campaign. The package is capable of parameterizing a study, pulling dependencies automatically, formatting output directories, and managing the flow and execution of the campaign. MaestroWF also provides a set of abstracted objects that can also be used to develop user specific scripts for launching simulation campaigns.
Mobile Technologies: Tools for Organizational Learning and Management in Schools. iPrincipals: Analyzing the Use of iPads by School Administrators

ERIC Educational Resources Information Center

Winslow, Joe; Dickerson, Jeremy; Lee, Cheng-Yuan; Geer, Gregory

2012-01-01

This paper reports findings from an evaluation of a district-wide initiative deploying iPads to school administrators (principals) to improve workflow efficiencies and promote technology leadership self-efficacy. The findings indicate that iPad utilization not only facilitated administrative tasks (memos, calendars, etc.), but also improved…
Designing the safety of healthcare. Participation of ergonomics to the design of cooperative systems in radiotherapy.

PubMed

Munoz, Maria Isabel; Bouldi, Nadia; Barcellini, Flore; Nascimento, Adelaide

2012-01-01

This communication deals with the involvement of ergonomists in a research-action design process of a software platform in radiotherapy. The goal of the design project is to enhance patient safety by designing a workflow software that supports cooperation between professionals producing treatment in radiotherapy. The general framework of our approach is the ergonomics management of a design process, which is based in activity analysis and grounded in participatory design. Two fields are concerned by the present action: a design environment which is a participatory design process that involves software designers, caregivers as future users and ergonomists; and a reference real work setting in radiotherapy. Observations, semi-structured interviews and participatory workshops allow the characterization of activity in radiotherapy dealing with uses of cooperative tools, sources of variability and non-ruled strategies to manage the variability of the situations. This production of knowledge about work searches to enhance the articulation between technocentric and anthropocentric approaches, and helps in clarifying design requirements. An issue of this research-action is to develop a framework to define the parameters of the workflow tool, and the conditions of its deployment.
Towards Automatic Validation and Healing of Citygml Models for Geometric and Semantic Consistency

NASA Astrophysics Data System (ADS)

Alam, N.; Wagner, D.; Wewetzer, M.; von Falkenhausen, J.; Coors, V.; Pries, M.

2013-09-01

A steadily growing number of application fields for large 3D city models have emerged in recent years. Like in many other domains, data quality is recognized as a key factor for successful business. Quality management is mandatory in the production chain nowadays. Automated domain-specific tools are widely used for validation of business-critical data but still common standards defining correct geometric modeling are not precise enough to define a sound base for data validation of 3D city models. Although the workflow for 3D city models is well-established from data acquisition to processing, analysis and visualization, quality management is not yet a standard during this workflow. Processing data sets with unclear specification leads to erroneous results and application defects. We show that this problem persists even if data are standard compliant. Validation results of real-world city models are presented to demonstrate the potential of the approach. A tool to repair the errors detected during the validation process is under development; first results are presented and discussed. The goal is to heal defects of the models automatically and export a corrected CityGML model.
Context-aware workflow management of mobile health applications.

PubMed

Salden, Alfons; Poortinga, Remco

2006-01-01

We propose a medical application management architecture that allows medical (IT) experts readily designing, developing and deploying context-aware mobile health (m-health) applications or services. In particular, we elaborate on how our application workflow management architecture enables chaining, coordinating, composing, and adapting context-sensitive medical application components such that critical Quality of Service (QoS) and Quality of Context (QoC) requirements typical for m-health applications or services can be met. This functional architectural support requires learning modules for distilling application-critical selection of attention and anticipation models. These models will help medical experts constructing and adjusting on-the-fly m-health application workflows and workflow strategies. We illustrate our context-aware workflow management paradigm for a m-health data delivery problem, in which optimal communication network configurations have to be determined.
Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

PubMed

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-08-07

There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

PubMed Central

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-01-01

Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127
Implementing bioinformatic workflows within the bioextract server

USDA-ARS?s Scientific Manuscript database

Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline.

PubMed

Thanki, Anil S; Soranzo, Nicola; Haerty, Wilfried; Davey, Robert P

2018-03-01

Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.
LACO-Wiki: A land cover validation tool and a new, innovative teaching resource for remote sensing and the geosciences

NASA Astrophysics Data System (ADS)

See, Linda; Perger, Christoph; Dresel, Christopher; Hofer, Martin; Weichselbaum, Juergen; Mondel, Thomas; Steffen, Fritz

2016-04-01

The validation of land cover products is an important step in the workflow of generating a land cover map from remotely-sensed imagery. Many students of remote sensing will be given exercises on classifying a land cover map followed by the validation process. Many algorithms exist for classification, embedded within proprietary image processing software or increasingly as open source tools. However, there is little standardization for land cover validation, nor a set of open tools available for implementing this process. The LACO-Wiki tool was developed as a way of filling this gap, bringing together standardized land cover validation methods and workflows into a single portal. This includes the storage and management of land cover maps and validation data; step-by-step instructions to guide users through the validation process; sound sampling designs; an easy-to-use environment for validation sample interpretation; and the generation of accuracy reports based on the validation process. The tool was developed for a range of users including producers of land cover maps, researchers, teachers and students. The use of such a tool could be embedded within the curriculum of remote sensing courses at a university level but is simple enough for use by students aged 13-18. A beta version of the tool is available for testing at: http://www.laco-wiki.net.
Integration of EGA secure data access into Galaxy.

PubMed

Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; J A Fijneman, Remond; Boiten, Jan-Willem; A Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne

2016-01-01

High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
Integration of EGA secure data access into Galaxy

PubMed Central

Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; Fijneman, Remond J.A.; Boiten, Jan-Willem; A. Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne

2016-01-01

High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer. PMID:28232859
chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery.

PubMed

Lee, Man-Ling; Aliagas, Ignacio; Feng, Jianwen A; Gabriel, Thomas; O'Donnell, T J; Sellers, Benjamin D; Wiswedel, Bernd; Gobbi, Alberto

2017-06-12

Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and write SD files from stdin and to stdout to be wrapped into KNIME nodes. The combination of chemalot and chemalot_knime not only facilitates the compilation and maintenance of sequences of command line programs but also allows KNIME workflows to take advantage of the compute power of a LINUX cluster. Use of the command line programs is demonstrated in three different workflow examples: (1) A workflow to create a data file with project-relevant data for structure-activity or property analysis and other type of investigations, (2) The creation of a quantitative structure-property-relationship model using the command line programs via KNIME nodes, and (3) The analysis of strain energy in small molecule ligand conformations from the Protein Data Bank database. The chemalot and chemalot_knime packages provide lightweight and powerful tools for many tasks in cheminformatics. They are easily integrated with other open source and commercial command line tools and can be combined to build new and even more powerful tools. The chemalot_knime package facilitates the generation and maintenance of user-defined command line workflows, taking advantage of the graphical design capabilities in KNIME. Graphical abstract Example KNIME workflow with chemalot nodes and the corresponding command line pipe.

Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization

PubMed Central

Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley

2015-01-01

Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173
Technology in the OR: AORN Members' Perceptions of the Effects on Workflow Efficiency and Quality Patient Care.

PubMed

Sipes, Carolyn; Baker, Joy Don

2015-09-01

This collaborative study sought to describe technology used by AORN members at work, inclusive of radio-frequency identification or barcode scanning (RFID), data collection tools (DATA), workflow or dashboard management tools (DASHBOARD), and environmental services/room decontamination technologies (ENVIRON), and to identify the perceived effects of each technology on workflow efficiency (WFE) and quality patient care (QPC). The 462 respondents to the AORN Technology in the OR survey reported use of technology (USE) in all categories. Eleven of 17 RFID items had a strong positive correlation between the designated USE item and the perceived effect on WFE and QPC. Five of the most-used technology items were found in the DATA category. Two of the five related to Intraoperative Nursing Documentation and the use of the Perioperative Nursing Data Set. The other three related to Imaging Integration for Radiology Equipment, Video Camera Systems, and Fiber-optic Systems. All three elements explored in the DASHBOARD category (ie, Patient Update, OR Case, OR Efficiency) demonstrated approximately 50% or greater perceived effectiveness in WFE and QPC. There was a low reported use of ENVIRON technologies, resulting in limited WFE and QPC data for this category. Copyright © 2015 AORN, Inc. Published by Elsevier Inc. All rights reserved.
A Mixed-Methods Research Framework for Healthcare Process Improvement.

PubMed

Bastian, Nathaniel D; Munoz, David; Ventura, Marta

2016-01-01

The healthcare system in the United States is spiraling out of control due to ever-increasing costs without significant improvements in quality, access to care, satisfaction, and efficiency. Efficient workflow is paramount to improving healthcare value while maintaining the utmost standards of patient care and provider satisfaction in high stress environments. This article provides healthcare managers and quality engineers with a practical healthcare process improvement framework to assess, measure and improve clinical workflow processes. The proposed mixed-methods research framework integrates qualitative and quantitative tools to foster the improvement of processes and workflow in a systematic way. The framework consists of three distinct phases: 1) stakeholder analysis, 2a) survey design, 2b) time-motion study, and 3) process improvement. The proposed framework is applied to the pediatric intensive care unit of the Penn State Hershey Children's Hospital. The implementation of this methodology led to identification and categorization of different workflow tasks and activities into both value-added and non-value added in an effort to provide more valuable and higher quality patient care. Based upon the lessons learned from the case study, the three-phase methodology provides a better, broader, leaner, and holistic assessment of clinical workflow. The proposed framework can be implemented in various healthcare settings to support continuous improvement efforts in which complexity is a daily element that impacts workflow. We proffer a general methodology for process improvement in a healthcare setting, providing decision makers and stakeholders with a useful framework to help their organizations improve efficiency. Published by Elsevier Inc.
NASA SensorWeb and OGC Standards for Disaster Management

NASA Technical Reports Server (NTRS)

Mandl, Dan

2010-01-01

I. Goal: Enable user to cost-effectively find and create customized data products to help manage disasters; a) On-demand; b) Low cost and non-specialized tools such as Google Earth and browsers; c) Access via open network but with sufficient security. II. Use standards to interface various sensors and resultant data: a) Wrap sensors in Open Geospatial Consortium (OGC) standards; b) Wrap data processing algorithms and servers with OGC standards c) Use standardized workflows to orchestrate and script the creation of these data; products. III. Target Web 2.0 mass market: a) Make it simple and easy to use; b) Leverage new capabilities and tools that are emerging; c) Improve speed and responsiveness.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
A Model of Workflow Composition for Emergency Management

NASA Astrophysics Data System (ADS)

Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu

The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
Worklist handling in workflow-enabled radiological application systems

NASA Astrophysics Data System (ADS)

Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens

2000-05-01

For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.
Taverna: a tool for building and running workflows of services

PubMed Central

Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

2006-01-01

Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108
Metaworkflows and Workflow Interoperability for Heliophysics

NASA Astrophysics Data System (ADS)

Pierantoni, Gabriele; Carley, Eoin P.

2014-06-01

Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They- implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WSPGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
A python framework for environmental model uncertainty analysis

USGS Publications Warehouse

White, Jeremy; Fienen, Michael N.; Doherty, John E.

2016-01-01

We have developed pyEMU, a python framework for Environmental Modeling Uncertainty analyses, open-source tool that is non-intrusive, easy-to-use, computationally efficient, and scalable to highly-parameterized inverse problems. The framework implements several types of linear (first-order, second-moment (FOSM)) and non-linear uncertainty analyses. The FOSM-based analyses can also be completed prior to parameter estimation to help inform important modeling decisions, such as parameterization and objective function formulation. Complete workflows for several types of FOSM-based and non-linear analyses are documented in example notebooks implemented using Jupyter that are available in the online pyEMU repository. Example workflows include basic parameter and forecast analyses, data worth analyses, and error-variance analyses, as well as usage of parameter ensemble generation and management capabilities. These workflows document the necessary steps and provides insights into the results, with the goal of educating users not only in how to apply pyEMU, but also in the underlying theory of applied uncertainty quantification.
Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

PubMed

Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

2017-12-01

Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.
MetaboTools: A comprehensive toolbox for analysis of genome-scale metabolic models

DOE PAGES

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

2016-08-03

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
Basics of Confocal Microscopy and the Complexity of Diagnosing Skin Tumors: New Imaging Tools in Clinical Practice, Diagnostic Workflows, Cost-Estimate, and New Trends.

PubMed

Que, Syril Keena T; Grant-Kels, Jane M; Longo, Caterina; Pellacani, Giovanni

2016-10-01

The use of reflectance confocal microscopy (RCM) and other noninvasive imaging devices can potentially streamline clinical care, leading to more precise and efficient management of skin cancer. This article explores the potential role of RCM in cutaneous oncology, as an adjunct to more established techniques of detecting and monitoring for skin cancer, such as dermoscopy and total body photography. Discussed are current barriers to the adoption of RCM, diagnostic workflows and standards of care in the United States and Europe, and medicolegal issues. The potential role of RCM and other similar technological innovations in the enhancement of dermatologic care is evaluated. Copyright © 2016 Elsevier Inc. All rights reserved.
Electronic Risk Assessment System as an Appropriate Tool for the Prevention of Cancer: a Qualitative Study.

PubMed

Javan Amoli, Amir Hossein; Maserat, Elham; Safdari, Reza; Zali, Mohammad Reza

2015-01-01

Decision making modalities for screening for many cancer conditions and different stages have become increasingly complex. Computer-based risk assessment systems facilitate scheduling and decision making and support the delivery of cancer screening services. The aim of this article was to survey electronic risk assessment system as an appropriate tool for the prevention of cancer. A qualitative design was used involving 21 face-to-face interviews. Interviewing involved asking questions and getting answers from exclusive managers of cancer screening. Of the participants 6 were female and 15 were male, and ages ranged from 32 to 78 years. The study was based on a grounded theory approach and the tool was a semi- structured interview. Researchers studied 5 dimensions, comprising electronic guideline standards of colorectal cancer screening, work flow of clinical and genetic activities, pathways of colorectal cancer screening and functionality of computer based guidelines and barriers. Electronic guideline standards of colorectal cancer screening were described in the s3 categories of content standard, telecommunications and technical standards and nomenclature and classification standards. According to the participations' views, workflow and genetic pathways of colorectal cancer screening were identified. The study demonstrated an effective role of computer-guided consultation for screening management. Electronic based systems facilitate real-time decision making during a clinical interaction. Electronic pathways have been applied for clinical and genetic decision support, workflow management, update recommendation and resource estimates. A suitable technical and clinical infrastructure is an integral part of clinical practice guidline of screening. As a conclusion, it is recommended to consider the necessity of architecture assessment and also integration standards.
Performances of the PIPER scalable child human body model in accident reconstruction

PubMed Central

Giordano, Chiara; Kleiven, Svein

2017-01-01

Human body models (HBMs) have the potential to provide significant insights into the pediatric response to impact. This study describes a scalable/posable approach to perform child accident reconstructions using the Position and Personalize Advanced Human Body Models for Injury Prediction (PIPER) scalable child HBM of different ages and in different positions obtained by the PIPER tool. Overall, the PIPER scalable child HBM managed reasonably well to predict the injury severity and location of the children involved in real-life crash scenarios documented in the medical records. The developed methodology and workflow is essential for future work to determine child injury tolerances based on the full Child Advanced Safety Project for European Roads (CASPER) accident reconstruction database. With the workflow presented in this study, the open-source PIPER scalable HBM combined with the PIPER tool is also foreseen to have implications for improved safety designs for a better protection of children in traffic accidents. PMID:29135997
Data management routines for reproducible research using the G-Node Python Client library

PubMed Central

Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J.; Garbers, Christian; Rautenberg, Philipp L.; Wachtler, Thomas

2014-01-01

Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow. PMID:24634654
Data management routines for reproducible research using the G-Node Python Client library.

PubMed

Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J; Garbers, Christian; Rautenberg, Philipp L; Wachtler, Thomas

2014-01-01

Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow.
Construction of databases: advances and significance in clinical research.

PubMed

Long, Erping; Huang, Bingjie; Wang, Liming; Lin, Xiaoyu; Lin, Haotian

2015-12-01

Widely used in clinical research, the database is a new type of data management automation technology and the most efficient tool for data management. In this article, we first explain some basic concepts, such as the definition, classification, and establishment of databases. Afterward, the workflow for establishing databases, inputting data, verifying data, and managing databases is presented. Meanwhile, by discussing the application of databases in clinical research, we illuminate the important role of databases in clinical research practice. Lastly, we introduce the reanalysis of randomized controlled trials (RCTs) and cloud computing techniques, showing the most recent advancements of databases in clinical research.
Design and implementation of workflow engine for service-oriented architecture

NASA Astrophysics Data System (ADS)

Peng, Shuqing; Duan, Huining; Chen, Deyun

2009-04-01

As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
A UIMA wrapper for the NCBO annotator

PubMed Central

Roeder, Christophe; Jonquet, Clement; Shah, Nigam H.; Baumgartner, William A.; Verspoor, Karin; Hunter, Lawrence

2010-01-01

Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows. Availability: This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows. Contact: chris.roeder@ucdenver.edu PMID:20505005

SU-F-P-03: Management of Time to Treatment Inititation: Case for An Electronic Whiteboard

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adnani, N

2016-06-15

Purpose: To determine if data mining of an electronic whiteboard improves the management of the Time to Treatment Initiation (TTI) in radiation oncology. Methods: An electronic whiteboard designed to help in managing the planning workflow and improves communication regarding patient planning progress was used to record the dates at which each phase of the planning process began or completed. These are CT Sim date, Plan Start, Physician Review, Physicist Review, Approval for Treatment Delivery, Setup or Verification of Simulation. Results: During clinical implementation, the electronic whiteboard was able to fulfill its primary objective of providing a transparent account of themore » planning progress of each patient. Peer pressure also meant that individual tasks, such as contouring, were easily brought to the attention of the responsible party and prioritized accordingly. Data mining to analyze the electronic whiteboard per patient (figure 1), per diagnosis (figure 2), per treatment modality (figure 3), per physician (figure 4), per planner (figure 5), etc., added another sophisticated tool in the management of Time to Treatment Initiation without compromising quality of the plans being generated. A longer than necessary time between CT Sim and Plan Start can be discussed among the members of the treatment team as an indication of inadequate/outdated CT Simulator, Contouring Tools, Image Fusion Tools, Other Imaging Studies (MRI, PET/CT) performed, etc. The same for the Plan Start to Physician Review where an extended time than expected may be due unrealistic planning goals, limited planning system features, etc. Conclusion: An Electronic Whiteboard in radiation oncology is not only helping with organizing planning workflow, it is also a potent tool that can be used to reduce the Time to Treatment Initiation by providing the clinic with hard data about the duration of each phase treatment planning as a function of different variable affecting the planning process. The work is supported by the Global Medical Physics Institute.« less
A knowledge-based decision support system in bioinformatics: an application to protein complex extraction

PubMed Central

2013-01-01

Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results. PMID:23368995
How I do it: a practical database management system to assist clinical research teams with data collection, organization, and reporting.

PubMed

Lee, Howard; Chapiro, Julius; Schernthaner, Rüdiger; Duran, Rafael; Wang, Zhijun; Gorodetski, Boris; Geschwind, Jean-François; Lin, MingDe

2015-04-01

The objective of this study was to demonstrate that an intra-arterial liver therapy clinical research database system is a more workflow efficient and robust tool for clinical research than a spreadsheet storage system. The database system could be used to generate clinical research study populations easily with custom search and retrieval criteria. A questionnaire was designed and distributed to 21 board-certified radiologists to assess current data storage problems and clinician reception to a database management system. Based on the questionnaire findings, a customized database and user interface system were created to perform automatic calculations of clinical scores including staging systems such as the Child-Pugh and Barcelona Clinic Liver Cancer, and facilitates data input and output. Questionnaire participants were favorable to a database system. The interface retrieved study-relevant data accurately and effectively. The database effectively produced easy-to-read study-specific patient populations with custom-defined inclusion/exclusion criteria. The database management system is workflow efficient and robust in retrieving, storing, and analyzing data. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ludascher, Bertram; Altintas, Ilkay

Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.
Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis.

PubMed

Sreedharan, Vipin T; Schultheiss, Sebastian J; Jean, Géraldine; Kahles, André; Bohnert, Regina; Drewe, Philipp; Mudrakarta, Pramod; Görnitz, Nico; Zeller, Georg; Rätsch, Gunnar

2014-05-01

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Applications of the pipeline environment for visual informatics and genomics computations

PubMed Central

2011-01-01

Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102
Common Workflow Service: Standards Based Solution for Managing Operational Processes

NASA Astrophysics Data System (ADS)

Tinio, A. W.; Hollins, G. A.

2017-06-01

The Common Workflow Service is a collaborative and standards-based solution for managing mission operations processes using techniques from the Business Process Management (BPM) discipline. This presentation describes the CWS and its benefits.
Grid-based platform for training in Earth Observation

NASA Astrophysics Data System (ADS)

Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Frincu, Marc; Neagul, Marian; Gorgan, Dorian; Stefanut, Teodor

2010-05-01

GiSHEO platform [1] providing on-demand services for training and high education in Earth Observation is developed, in the frame of an ESA funded project through its PECS programme, to respond to the needs of powerful education resources in remote sensing field. It intends to be a Grid-based platform of which potential for experimentation and extensibility are the key benefits compared with a desktop software solution. Near-real time applications requiring simultaneous multiple short-time-response data-intensive tasks, as in the case of a short time training event, are the ones that are proved to be ideal for this platform. The platform is based on Globus Toolkit 4 facilities for security and process management, and on the clusters of four academic institutions involved in the project. The authorization uses a VOMS service. The main public services are the followings: the EO processing services (represented through special WSRF-type services); the workflow service exposing a particular workflow engine; the data indexing and discovery service for accessing the data management mechanisms; the processing services, a collection allowing easy access to the processing platform. The WSRF-type services for basic satellite image processing are reusing free image processing tools, OpenCV and GDAL. New algorithms and workflows were develop to tackle with challenging problems like detecting the underground remains of old fortifications, walls or houses. More details can be found in [2]. Composed services can be specified through workflows and are easy to be deployed. The workflow engine, OSyRIS (Orchestration System using a Rule based Inference Solution), is based on DROOLS, and a new rule-based workflow language, SILK (SImple Language for worKflow), has been built. Workflow creation in SILK can be done with or without a visual designing tools. The basics of SILK are the tasks and relations (rules) between them. It is similar with the SCUFL language, but not relying on XML in order to allow the introduction of more workflow specific issues. Moreover, an event-condition-action (ECA) approach allows a greater flexibility when expressing data and task dependencies, as well as the creation of adaptive workflows which can react to changes in the configuration of the Grid or in the workflow itself. Changes inside the grid are handled by creating specific rules which allow resource selection based on various task scheduling criteria. Modifications of the workflow are usually accomplished either by inserting or retracting at runtime rules belonging to it or by modifying the executor of the task in case a better one is found. The former implies changes in its structure while the latter does not necessarily mean changes of the resource but more precisely changes of the algorithm used for solving the task. More details can be found in [3]. Another important platform component is the data indexing and storage service, GDIS, providing features for data storage, indexing data using a specialized RDBMS, finding data by various conditions, querying external services and keeping track of temporary data generated by other components. The data storage component part of GDIS is responsible for storing the data by using available storage backends such as local disk file systems (ext3), local cluster storage (GFS) or distributed file systems (HDFS). A front-end GridFTP service is capable of interacting with the storage domains on behalf of the clients and in a uniform way and also enforces the security restrictions provided by other specialized services and related with data access. The data indexing is performed by PostGIS. An advanced and flexible interface for searching the project's geographical repository is built around a custom query language (LLQL - Lisp Like Query Language) designed to provide fine grained access to the data in the repository and to query external services (e.g. for exploiting the connection with GENESI-DR catalog). More details can be found in [4]. The Workload Management System (WMS) provides two types of resource managers. The first one will be based on Condor HTC and use Condor as a job manager for task dispatching and working nodes (for development purposes) while the second one will use GT4 GRAM (for production purposes). The WMS main component, the Grid Task Dispatcher (GTD), is responsible for the interaction with other internal services as the composition engine in order to facilitate access to the processing platform. Its main responsibilities are to receive tasks from the workflow engine or directly from user interface, to use a task description language (the ClassAd meta language in case of Condor HTC) for job units, to submit and check the status of jobs inside the workload management system and to retrieve job logs for debugging purposes. More details can be found in [4]. A particular component of the platform is eGLE, the eLearning environment. It provides the functionalities necessary to create the visual appearance of the lessons through the usage of visual containers like tools, patterns and templates. The teacher uses the platform for testing the already created lessons, as well as for developing new lesson resources, such as new images and workflows describing graph-based processing. The students execute the lessons or describe and experiment with new workflows or different data. The eGLE database includes several workflow-based lesson descriptions, teaching materials and lesson resources, selected satellite and spatial data. More details can be found in [5]. A first training event of using the platform was organized in September 2009 during 11th SYNASC symposium (links to the demos, testing interface, and exercises are available on project site [1]). The eGLE component was presented at 4th GPC conference in May 2009. Moreover, the functionality of the platform will be presented as demo in April 2010 at 5th EGEE User forum. References: [1] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [2] D. Petcu, D. Zaharie, M. Neagul, S. Panica, M. Frincu, D. Gorgan, T. Stefanut, V. Bacu, Remote Sensed Image Processing on Grids for Training in Earth Observation. In Image Processing, V. Kordic (ed.), In-Tech, January 2010. [3] M. Neagul, S. Panica, D. Petcu, D. Zaharie, D. Gorgan, Web and Grid Services for Training in Earth Observation, IDAACS 2009, IEEE Computer Press, 241-246 [4] M. Frincu, S. Panica, M. Neagul, D. Petcu, Gisheo: On Demand Grid Service Based Platform for EO Data Processing. HiperGrid 2009, Politehnica Press, 415-422. [5] D. Gorgan, T. Stefanut, V. Bacu, Grid Based Training Environment for Earth Observation, GPC 2009, LNCS 5529, 98-109
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, Wucherl; Koo, Michelle; Cao, Yu

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

PubMed

Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

2016-02-04

Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Design and implementation of a secure workflow system based on PKI/PMI

NASA Astrophysics Data System (ADS)

Yan, Kai; Jiang, Chao-hui

2013-03-01

As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.
Innovations in Medication Preparation Safety and Wastage Reduction: Use of a Workflow Management System in a Pediatric Hospital.

PubMed

Davis, Stephen Jerome; Hurtado, Josephine; Nguyen, Rosemary; Huynh, Tran; Lindon, Ivan; Hudnall, Cedric; Bork, Sara

2017-01-01

Background: USP <797> regulatory requirements have mandated that pharmacies improve aseptic techniques and cleanliness of the medication preparation areas. In addition, the Institute for Safe Medication Practices (ISMP) recommends that technology and automation be used as much as possible for preparing and verifying compounded sterile products. Objective: To determine the benefits associated with the implementation of the workflow management system, such as reducing medication preparation and delivery errors, reducing quantity and frequency of medication errors, avoiding costs, and enhancing the organization's decision to move toward positive patient identification (PPID). Methods: At Texas Children's Hospital, data were collected and analyzed from January 2014 through August 2014 in the pharmacy areas in which the workflow management system would be implemented. Data were excluded for September 2014 during the workflow management system oral liquid implementation phase. Data were collected and analyzed from October 2014 through June 2015 to determine whether the implementation of the workflow management system reduced the quantity and frequency of reported medication errors. Data collected and analyzed during the study period included the quantity of doses prepared, number of incorrect medication scans, number of doses discontinued from the workflow management system queue, and the number of doses rejected. Data were collected and analyzed to identify patterns of incorrect medication scans, to determine reasons for rejected medication doses, and to determine the reduction in wasted medications. Results: During the 17-month study period, the pharmacy department dispensed 1,506,220 oral liquid and injectable medication doses. From October 2014 through June 2015, the pharmacy department dispensed 826,220 medication doses that were prepared and checked via the workflow management system. Of those 826,220 medication doses, there were 16 reported incorrect volume errors. The error rate after the implementation of the workflow management system averaged 8.4%, which was a 1.6% reduction. After the implementation of the workflow management system, the average number of reported oral liquid medication and injectable medication errors decreased to 0.4 and 0.2 times per week, respectively. Conclusion: The organization was able to achieve its purpose and goal of improving the provision of quality pharmacy care through optimal medication use and safety by reducing medication preparation errors. Error rates decreased and the workflow processes were streamlined, which has led to seamless operations within the pharmacy department. There has been significant cost avoidance and waste reduction and enhanced interdepartmental satisfaction due to the reduction of reported medication errors.
LHCb migration from Subversion to Git

NASA Astrophysics Data System (ADS)

Clemencic, M.; Couturier, B.; Closier, J.; Cattaneo, M.

2017-10-01

Due to user demand and to support new development workflows based on code review and multiple development streams, LHCb decided to port the source code management from Subversion to Git, using the CERN GitLab hosting service. Although tools exist for this kind of migration, LHCb specificities and development models required careful planning of the migration, development of migration tools, changes to the development model, and redefinition of the release procedures. Moreover we had to support a hybrid situation with some software projects hosted in Git and others still in Subversion, or even branches of one projects hosted in different systems. We present the way we addressed the special LHCb requirements, the technical details of migrating large non standard Subversion repositories, and how we managed to smoothly migrate the software projects following the schedule of each project manager.
Closha: bioinformatics workflow system for the analysis of massive sequencing data.

PubMed

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

2018-02-19

While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
Facilitating hydrological data analysis workflows in R: the RHydro package

NASA Astrophysics Data System (ADS)

Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

2015-04-01

The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges of the RHydro package, including integration with big data technologies, web technologies, and emerging data models in hydrology.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deelman, Ewa; Carothers, Christopher; Mandal, Anirban

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

DOE PAGES

Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...

2015-07-14

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.

2017-10-01

The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.
The implementation of e-learning tools to enhance undergraduate bioinformatics teaching and learning: a case study in the National University of Singapore

PubMed Central

2009-01-01

Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511
Modelling and analysis of workflow for lean supply chains

NASA Astrophysics Data System (ADS)

Ma, Jinping; Wang, Kanliang; Xu, Lida

2011-11-01

Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.

A Computational Workflow for the Automated Generation of Models of Genetic Designs.

PubMed

Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil

2018-06-05

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.
Proteomics data exchange and storage: the need for common standards and public repositories.

PubMed

Jiménez, Rafael C; Vizcaíno, Juan Antonio

2013-01-01

Both the existence of data standards and public databases or repositories have been key factors behind the development of the existing "omics" approaches. In this book chapter we first review the main existing mass spectrometry (MS)-based proteomics resources: PRIDE, PeptideAtlas, GPMDB, and Tranche. Second, we report on the current status of the different proteomics data standards developed by the Proteomics Standards Initiative (PSI): the formats mzML, mzIdentML, mzQuantML, TraML, and PSI-MI XML are then reviewed. Finally, we present an easy way to query and access MS proteomics data in the PRIDE database, as a representative of the existing repositories, using the workflow management system (WMS) tool Taverna. Two different publicly available workflows are explained and described.
Development of a user customizable imaging informatics-based intelligent workflow engine system to enhance rehabilitation clinical trials

NASA Astrophysics Data System (ADS)

Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent

2014-03-01

Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.
The medical simulation markup language - simplifying the biomechanical modeling workflow.

PubMed

Suwelack, Stefan; Stoll, Markus; Schalck, Sebastian; Schoch, Nicolai; Dillmann, Rüdiger; Bendl, Rolf; Heuveline, Vincent; Speidel, Stefanie

2014-01-01

Modeling and simulation of the human body by means of continuum mechanics has become an important tool in diagnostics, computer-assisted interventions and training. This modeling approach seeks to construct patient-specific biomechanical models from tomographic data. Usually many different tools such as segmentation and meshing algorithms are involved in this workflow. In this paper we present a generalized and flexible description for biomechanical models. The unique feature of the new modeling language is that it not only describes the final biomechanical simulation, but also the workflow how the biomechanical model is constructed from tomographic data. In this way, the MSML can act as a middleware between all tools used in the modeling pipeline. The MSML thus greatly facilitates the prototyping of medical simulation workflows for clinical and research purposes. In this paper, we not only detail the XML-based modeling scheme, but also present a concrete implementation. Different examples highlight the flexibility, robustness and ease-of-use of the approach.
Nexus: A modular workflow management system for quantum simulation codes

NASA Astrophysics Data System (ADS)

Krogel, Jaron T.

2016-01-01

The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.
Big data analytics workflow management for eScience

NASA Astrophysics Data System (ADS)

Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

2015-04-01

In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the real challenge in many practical scientific use cases. This talk will specifically address the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Three real use cases concerning analytics workflows for sea situational awareness, fire danger prevention, climate change and biodiversity will be discussed in detail.
Comprehensive, powerful, efficient, intuitive: a new software framework for clinical imaging applications

NASA Astrophysics Data System (ADS)

Augustine, Kurt E.; Holmes, David R., III; Hanson, Dennis P.; Robb, Richard A.

2006-03-01

One of the greatest challenges for a software engineer is to create a complex application that is comprehensive enough to be useful to a diverse set of users, yet focused enough for individual tasks to be carried out efficiently with minimal training. This "powerful yet simple" paradox is particularly prevalent in advanced medical imaging applications. Recent research in the Biomedical Imaging Resource (BIR) at Mayo Clinic has been directed toward development of an imaging application framework that provides powerful image visualization/analysis tools in an intuitive, easy-to-use interface. It is based on two concepts very familiar to physicians - Cases and Workflows. Each case is associated with a unique patient and a specific set of routine clinical tasks, or a workflow. Each workflow is comprised of an ordered set of general-purpose modules which can be re-used for each unique workflow. Clinicians help describe and design the workflows, and then are provided with an intuitive interface to both patient data and analysis tools. Since most of the individual steps are common to many different workflows, the use of general-purpose modules reduces development time and results in applications that are consistent, stable, and robust. While the development of individual modules may reflect years of research by imaging scientists, new customized workflows based on the new modules can be developed extremely fast. If a powerful, comprehensive application is difficult to learn and complicated to use, it will be unacceptable to most clinicians. Clinical image analysis tools must be intuitive and effective or they simply will not be used.
Content and Workflow Management for Library Websites: Case Studies

ERIC Educational Resources Information Center

Yu, Holly, Ed.

2005-01-01

Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…
Identifying impact of software dependencies on replicability of biomedical workflows.

PubMed

Miksa, Tomasz; Rauber, Andreas; Mina, Eleni

2016-12-01

Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright Â© 2016 Elsevier Inc. All rights reserved.
A recipe for consistent 3D management of velocity data and time-depth conversion using Vel-IO 3D

NASA Astrophysics Data System (ADS)

Maesano, Francesco E.; D'Ambrogi, Chiara

2017-04-01

3D geological model production and related basin analyses need large and consistent seismic dataset and hopefully well logs to support correlation and calibration; the workflow and tools used to manage and integrate different type of data control the soundness of the final 3D model. Even though seismic interpretation is a basic early step in such workflow, the most critical step to obtain a comprehensive 3D model useful for further analyses is represented by the construction of an effective 3D velocity model and a well constrained time-depth conversion. We present a complex workflow that includes comprehensive management of large seismic dataset and velocity data, the construction of a 3D instantaneous multilayer-cake velocity model, the time-depth conversion of highly heterogeneous geological framework, including both depositional and structural complexities. The core of the workflow is the construction of the 3D velocity model using Vel-IO 3D tool (Maesano and D'Ambrogi, 2017; https://github.com/framae80/Vel-IO3D) that is composed by the following three scripts, written in Python 2.7.11 under ArcGIS ArcPy environment: i) the 3D instantaneous velocity model builder creates a preliminary 3D instantaneous velocity model using key horizons in time domain and velocity data obtained from the analysis of well and pseudo-well logs. The script applies spatial interpolation to the velocity parameters and calculates the value of depth of each point on each horizon bounding the layer-cake velocity model. ii) the velocity model optimizer improves the consistency of the velocity model by adding new velocity data indirectly derived from measured depths, thus reducing the geometrical uncertainties in the areas located far from the original velocity data. iii) the time-depth converter runs the time-depth conversion of any object located inside the 3D velocity model The Vel-IO 3D tool allows one to create 3D geological models consistent with the primary geological constraints (e.g. depth of the markers on wells). The workflow and Vel-IO 3D tool have been developed and tested for the construction of the 3D geological model of a flat region, 5700 km2 in area, located in the central part of the Po Plain (Northern Italy) in the frame of the European funded Project GeoMol. The study area was covered by a dense dataset of seismic lines (ca. 12000 km) and exploration wells (130 drilling), mainly deriving from oil and gas exploration activities. The interpretation of the seismic dataset leads to the construction of a 3D model in time domain that has been depth converted using Vel-IO 3D, with a 4 layer-cake 3D instantaneous velocity model. The resulting final 3D geological model, composed of 15 horizons and 150 faults, has been used for basin analysis at regional scale, for geothermal assessment, and for the update of the seismotectonic knowledge of the Po Plain. The Vel-IO 3D has been further used for the depth conversion of the accretionary prism of the Calabrian subduction (Southern Italy) and for a basin scale analysis of the Po Plain Plio-Pleistocene evolution. Maesano F.E. and D'Ambrogi C., (2017), Computers and Geosciences, doi: 10.1016/j.cageo.2016.11.013 Vel-IO 3D is available at: https://github.com/framae80/Vel-IO3D
XML schemas for common bioinformatic data types and their application in workflow systems

PubMed Central

Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

2006-01-01

Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823
Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences.

PubMed

Hartman, Amber L; Riddle, Sean; McPhillips, Timothy; Ludäscher, Bertram; Eisen, Jonathan A

2010-06-12

For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.
It's All About the Data: Workflow Systems and Weather

NASA Astrophysics Data System (ADS)

Plale, B.

2009-05-01

Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
Workflow Automation: A Collective Case Study

ERIC Educational Resources Information Center

Harlan, Jennifer

2013-01-01

Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Edward J., Jr.; Henry, Karen Lynne

Sandia National Laboratories develops technologies to: (1) sustain, modernize, and protect our nuclear arsenal (2) Prevent the spread of weapons of mass destruction; (3) Provide new capabilities to our armed forces; (4) Protect our national infrastructure; (5) Ensure the stability of our nation's energy and water supplies; and (6) Defend our nation against terrorist threats. We identified the need for a single overarching Integrated Workplace Management System (IWMS) that would enable us to focus on customer missions and improve FMOC processes. Our team selected highly configurable commercial-off-the-shelf (COTS) software with out-of-the-box workflow processes that integrate strategic planning, project management, facilitymore » assessments, and space management, and can interface with existing systems, such as Oracle, PeopleSoft, Maximo, Bentley, and FileNet. We selected the Integrated Workplace Management System (IWMS) from Tririga, Inc. Facility Management System (FMS) Benefits are: (1) Create a single reliable source for facility data; (2) Improve transparency with oversight organizations; (3) Streamline FMOC business processes with a single, integrated facility-management tool; (4) Give customers simple tools and real-time information; (5) Reduce indirect costs; (6) Replace approximately 30 FMOC systems and 60 homegrown tools (such as Microsoft Access databases); and (7) Integrate with FIMS.« less
The PEcAn Project: Accessible Tools for On-demand Ecosystem Modeling

NASA Astrophysics Data System (ADS)

Cowdery, E.; Kooper, R.; LeBauer, D.; Desai, A. R.; Mantooth, J.; Dietze, M.

2014-12-01

Ecosystem models play a critical role in understanding the terrestrial biosphere and forecasting changes in the carbon cycle, however current forecasts have considerable uncertainty. The amount of data being collected and produced is increasing on daily basis as we enter the "big data" era, but only a fraction of this data is being used to constrain models. Until we can improve the problems of model accessibility and model-data communication, none of these resources can be used to their full potential. The Predictive Ecosystem Analyzer (PEcAn) is an ecoinformatics toolbox and a set of workflows that wrap around an ecosystem model and manage the flow of information in and out of regional-scale TBMs. Here we present new modules developed in PEcAn to manage the processing of meteorological data, one of the primary driver dependencies for ecosystem models. The module downloads, reads, extracts, and converts meteorological observations to Unidata Climate Forecast (CF) NetCDF community standard, a convention used for most climate forecast and weather models. The module also automates the conversion from NetCDF to model specific formats, including basic merging, gap-filling, and downscaling procedures. PEcAn currently supports tower-based micrometeorological observations at Ameriflux and FluxNET sites, site-level CSV-formatted data, and regional and global reanalysis products such as the North American Regional Reanalysis and CRU-NCEP. The workflow is easily extensible to additional products and processing algorithms.These meteorological workflows have been coupled with the PEcAn web interface and now allow anyone to run multiple ecosystem models for any location on the Earth by simply clicking on an intuitive Google-map based interface. This will allow users to more readily compare models to observations at those sites, leading to better calibration and validation. Current work is extending these workflows to also process field, remotely-sensed, and historical observations of vegetation composition and structure. The processing of heterogeneous met and veg data within PEcAn is made possible using the Brown Dog cyberinfrastructure tools for unstructured data.
Using CyberShake Workflows to Manage Big Seismic Hazard Data on Large-Scale Open-Science HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2015-12-01

The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and migrating from file-based communication to MPI messaging, to greatly reduce the I/O demands and node-hour requirements of CyberShake. We will also present performance metrics from CyberShake Study 15.4, and discuss challenges that producers of Big Data on open-science HPC resources face moving forward.
Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

PubMed Central

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

PubMed

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
PATHA: Performance Analysis Tool for HPC Applications

DOE PAGES

Yoo, Wucherl; Koo, Michelle; Cao, Yi; ...

2016-02-18

Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. In order to address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data.more » Furthermore, it utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). This study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.« less

Workflow based framework for life science informatics.

PubMed

Tiwari, Abhishek; Sekhar, Arvind K T

2007-10-01

Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data.

PubMed

Davidson, Robert L; Weber, Ralf J M; Liu, Haoyu; Sharma-Oates, Archana; Viant, Mark R

2016-01-01

Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.
Nexus: a modular workflow management system for quantum simulation codes

DOE PAGES

Krogel, Jaron T.

2015-08-24

The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less
Successful Completion of FY18/Q1 ASC L2 Milestone 6355: Electrical Analysis Calibration Workflow Capability Demonstration.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Copps, Kevin D.

The Sandia Analysis Workbench (SAW) project has developed and deployed a production capability for SIERRA computational mechanics analysis workflows. However, the electrical analysis workflow capability requirements have only been demonstrated in early prototype states, with no real capability deployed for analysts’ use. This milestone aims to improve the electrical analysis workflow capability (via SAW and related tools) and deploy it for ongoing use. We propose to focus on a QASPR electrical analysis calibration workflow use case. We will include a number of new capabilities (versus today’s SAW), such as: 1) support for the XYCE code workflow component, 2) data managementmore » coupled to electrical workflow, 3) human-in-theloop workflow capability, and 4) electrical analysis workflow capability deployed on the restricted (and possibly classified) network at Sandia. While far from the complete set of capabilities required for electrical analysis workflow over the long term, this is a substantial first step toward full production support for the electrical analysts.« less
A Portable Regional Weather and Climate Downscaling System Using GEOS-5, LIS-6, WRF, and the NASA Workflow Tool

NASA Astrophysics Data System (ADS)

Kemp, E. M.; Putman, W. M.; Gurganus, J.; Burns, R. W.; Damon, M. R.; McConaughy, G. R.; Seablom, M. S.; Wojcik, G. S.

2009-12-01

We present a regional downscaling system (RDS) suitable for high-resolution weather and climate simulations in multiple supercomputing environments. The RDS is built on the NASA Workflow Tool, a software framework for configuring, running, and managing computer models on multiple platforms with a graphical user interface. The Workflow Tool is used to run the NASA Goddard Earth Observing System Model Version 5 (GEOS-5), a global atmospheric-ocean model for weather and climate simulations down to 1/4 degree resolution; the NASA Land Information System Version 6 (LIS-6), a land surface modeling system that can simulate soil temperature and moisture profiles; and the Weather Research and Forecasting (WRF) community model, a limited-area atmospheric model for weather and climate simulations down to 1-km resolution. The Workflow Tool allows users to customize model settings to user needs; saves and organizes simulation experiments; distributes model runs across different computer clusters (e.g., the DISCOVER cluster at Goddard Space Flight Center, the Cray CX-1 Desktop Supercomputer, etc.); and handles all file transfers and network communications (e.g., scp connections). Together, the RDS is intended to aid researchers by making simulations as easy as possible to generate on the computer resources available. Initial conditions for LIS-6 and GEOS-5 are provided by Modern Era Retrospective-Analysis for Research and Applications (MERRA) reanalysis data stored on DISCOVER. The LIS-6 is first run for 2-4 years forced by MERRA atmospheric analyses, generating initial conditions for the WRF soil physics. GEOS-5 is then initialized from MERRA data and run for the period of interest. Large-scale atmospheric data, sea-surface temperatures, and sea ice coverage from GEOS-5 are used as boundary conditions for WRF, which is run for the same period of interest. Multiply nested grids are used for both LIS-6 and WRF, with the innermost grid run at a resolution sufficient for typical local weather features (terrain, convection, etc.) All model runs, restarts, and file transfers are coordinated by the Workflow Tool. Two use cases are being pursued. First, the RDS generates regional climate simulations down to 4-km for the Chesapeake Bay region, with WRF output provided as input to more specialized models (e.g., ocean/lake, hydrological, marine biology, and air pollution). This will allow assessment of climate impact on local interests (e.g., changes in Bay water levels and temperatures, innundation, fish kills, etc.) Second, the RDS generates high-resolution hurricane simulations in the tropical North Atlantic. This use case will support Observing System Simulation Experiments (OSSEs) of dynamically-targeted lidar observations as part of the NASA Sensor Web Simulator project. Sample results will be presented at the AGU Fall Meeting.
Usability Testing of a National Substance Use Screening Tool Embedded in Electronic Health Records.

PubMed

Press, Anne; DeStio, Catherine; McCullagh, Lauren; Kapoor, Sandeep; Morley, Jeanne; Conigliaro, Joseph

2016-07-08

Screening, brief intervention, and referral to treatment (SBIRT) is currently being implemented into health systems nationally via paper and electronic methods. The purpose of this study was to evaluate the integration of an electronic SBIRT tool into an existing paper-based SBIRT clinical workflow in a patient-centered medical home. Usability testing was conducted in an academic ambulatory clinic. Two rounds of usability testing were done with medical office assistants (MOAs) using a paper and electronic version of the SBIRT tool, with two and four participants, respectively. Qualitative and quantitative data was analyzed to determine the impact of both tools on clinical workflow. A second round of usability testing was done with the revised electronic version and compared with the first version. Personal workflow barriers cited in the first round of testing were that the electronic health record (EHR) tool was disruptive to patient's visits. In Round 2 of testing, MOAs reported favoring the electronic version due to improved layout and the inclusion of an alert system embedded in the EHR. For example, using the system usability scale (SUS), MOAs reported a grade "1" for the statement, "I would like to use this system frequently" during the first round of testing but a "5" during the second round of analysis. The importance of testing usability of various mediums of tools used in health care screening is highlighted by the findings of this study. In the first round of testing, the electronic tool was reported as less user friendly, being difficult to navigate, and time consuming. Many issues faced in the first generation of the tool were improved in the second generation after usability was evaluated. This study demonstrates how usability testing of an electronic SBRIT tool can help to identify challenges that can impact clinical workflow. However, a limitation of this study was the small sample size of MOAs that participated. The results may have been biased to Northwell Health workers' perceptions of the SBIRT tool and their specific clinical workflow.
Using the iPlant collaborative discovery environment.

PubMed

Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J

2013-06-01

The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Organizational and technological insight as important factors for successful implementation of IT.

PubMed

Nikula, R E

1999-01-01

Politicians and hospital management in Sweden and Denmark focus on IT and especially Electronic Patient Record, EPR as a tool for changes that will lead to better economy as well as better quality and service to the patients. These changes are not direct effects of the new medium for patient records but indirect effects due to the possibilities embedded in the new technology. To ensure that the implementation is successful, i.e. leads to changes in organization structure and workflow, we need tools to prepare clinicians and management. The focus of this paper is the individual insight in technology and organization and it proposes a model to assess and categorize the possibilities of individuals and groups to participate in and make an implementation process powerful.
MassCascade: Visual Programming for LC-MS Data Processing in Metabolomics.

PubMed

Beisken, Stephan; Earll, Mark; Portwood, David; Seymour, Mark; Steinbeck, Christoph

2014-04-01

Liquid chromatography coupled to mass spectrometry (LC-MS) is commonly applied to investigate the small molecule complement of organisms. Several software tools are typically joined in custom pipelines to semi-automatically process and analyse the resulting data. General workflow environments like the Konstanz Information Miner (KNIME) offer the potential of an all-in-one solution to process LC-MS data by allowing easy integration of different tools and scripts. We describe MassCascade and its workflow plug-in for processing LC-MS data. The Java library integrates frequently used algorithms in a modular fashion, thus enabling it to serve as back-end for graphical front-ends. The functions available in MassCascade have been encapsulated in a plug-in for the workflow environment KNIME, allowing combined use with e.g. statistical workflow nodes from other providers and making the tool intuitive to use without knowledge of programming. The design of the software guarantees a high level of modularity where processing functions can be quickly replaced or concatenated. MassCascade is an open-source library for LC-MS data processing in metabolomics. It embraces the concept of visual programming through its KNIME plug-in, simplifying the process of building complex workflows. The library was validated using open data.
A free tool integrating GIS features and workflows to evaluate sediment connectivity in alpine catchments

NASA Astrophysics Data System (ADS)

Crema, Stefano; Schenato, Luca; Goldin, Beatrice; Marchi, Lorenzo; Cavalli, Marco

2014-05-01

The increased interest in sediment connectivity has brought the geomorphologists' community to focus on sediment fluxes as a key process (Cavalli et al., 2013; Heckmann and Schwanghart, 2013). The challenge of dealing with erosion-related processes in alpine catchments is of primary relevance for different fields of investigations and applications, including, but not limited to natural hazards, hydraulic structures design, ecology and stream restoration. The present work focuses on the development of a free tool for sediment connectivity assessment as described in Cavalli et al. (2013), introducing some novel improvements. The choice of going for a free software is motivated by the need of widening the access and improving participation beyond the restrictions on algorithms customization, typical of commercial software. A couple of features further enhance the tool: being completely free and adopting a user-friendly interface, its target audience includes researchers and stakeholders (e.g., local managers and civil protection authorities in charge of planning the priorities of intervention in the territory), being written in Python programming language, it can benefit from optimized algorithms for high-resolution DEMs (Digital Elevation Models) handling and for propagation workflows implementation; these two factors make the tool computationally competitive with the most recent commercial GIS products. The overall goal of this tool is supporting the analysis of sediment connectivity, facing the challenge of widening, as much as possible, the users' community among scientists and stakeholders. This aspect is crucial, as future improvement of this tool will benefit of feedbacks from users in order to improve the quantitative assessment of sediment connectivity as a major input information for the optimal management of mountain areas. References: Cavalli, M., Trevisani, S., Comiti, F., Marchi, L., 2013. Geomorphometric assessment of spatial sediment connectivity in small Alpine catchments. Geomorphology 188, 31-41. Heckmann, T., Schwanghart, W., 2013. Geomorphic coupling and sediment connectivity in an alpine catchment - Exploring sediment cascades using graph theory. Geomorphology 182, 89-103.
Realising the Uncertainty Enabled Model Web

NASA Astrophysics Data System (ADS)

Cornford, D.; Bastin, L.; Pebesma, E. J.; Williams, M.; Stasch, C.; Jones, R.; Gerharz, L.

2012-12-01

The FP7 funded UncertWeb project aims to create the "uncertainty enabled model web". The central concept here is that geospatial models and data resources are exposed via standard web service interfaces, such as the Open Geospatial Consortium (OGC) suite of encodings and interface standards, allowing the creation of complex workflows combining both data and models. The focus of UncertWeb is on the issue of managing uncertainty in such workflows, and providing the standards, architecture, tools and software support necessary to realise the "uncertainty enabled model web". In this paper we summarise the developments in the first two years of UncertWeb, illustrating several key points with examples taken from the use case requirements that motivate the project. Firstly we address the issue of encoding specifications. We explain the usage of UncertML 2.0, a flexible encoding for representing uncertainty based on a probabilistic approach. This is designed to be used within existing standards such as Observations and Measurements (O&M) and data quality elements of ISO19115 / 19139 (geographic information metadata and encoding specifications) as well as more broadly outside the OGC domain. We show profiles of O&M that have been developed within UncertWeb and how UncertML 2.0 is used within these. We also show encodings based on NetCDF and discuss possible future directions for encodings in JSON. We then discuss the issues of workflow construction, considering discovery of resources (both data and models). We discuss why a brokering approach to service composition is necessary in a world where the web service interfaces remain relatively heterogeneous, including many non-OGC approaches, in particular the more mainstream SOAP and WSDL approaches. We discuss the trade-offs between delegating uncertainty management functions to the service interfaces themselves and integrating the functions in the workflow management system. We describe two utility services to address conversion between uncertainty types, and between the spatial / temporal support of service inputs / outputs. Finally we describe the tools being generated within the UncertWeb project, considering three main aspects: i) Elicitation of uncertainties on model inputs. We are developing tools to enable domain experts to provide judgements about input uncertainties from UncertWeb model components (e.g. parameters in meteorological models) which allow panels of experts to engage in the process and reach a consensus view on the current knowledge / beliefs about that parameter or variable. We are developing systems for continuous and categorical variables as well as stationary spatial fields. ii) Visualisation of the resulting uncertain outputs from the end of the workflow, but also at intermediate steps. At this point we have prototype implementations driven by the requirements from the use cases that motivate UncertWeb. iii) Sensitivity and uncertainty analysis on model outputs. Here we show the design of the overall system we are developing, including the deployment of an emulator framework to allow computationally efficient approaches. We conclude with a summary of the open issues and remaining challenges we are facing in UncertWeb, and provide a brief overview of how we plan to tackle these.
Provenance-Powered Automatic Workflow Generation and Composition

NASA Astrophysics Data System (ADS)

Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

2015-12-01

In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
Modeling workflow to design machine translation applications for public health practice

PubMed Central

Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin

2014-01-01

Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922
Design of a decision-support architecture for management of remotely monitored patients.

PubMed

Basilakis, Jim; Lovell, Nigel H; Redmond, Stephen J; Celler, Branko G

2010-09-01

Telehealth is the provision of health services at a distance. Typically, this occurs in unsupervised or remote environments, such as a patient's home. We describe one such telehealth system and the integration of extracted clinical measurement parameters with a decision-support system (DSS). An enterprise application-server framework, combined with a rules engine and statistical analysis tools, is used to analyze the acquired telehealth data, searching for trends and shifts in parameter values, as well as identifying individual measurements that exceed predetermined or adaptive thresholds. An overarching business process engine is used to manage the core DSS knowledge base and coordinate workflow outputs of the DSS. The primary role for such a DSS is to provide an effective means to reduce the data overload and to provide a means of health risk stratification to allow appropriate targeting of clinical resources to best manage the health of the patient. In this way, the system may ultimately influence changes in workflow by targeting scarce clinical resources to patients of most need. A single case study extracted from an initial pilot trial of the system, in patients with chronic obstructive pulmonary disease and chronic heart failure, will be reviewed to illustrate the potential benefit of integrating telehealth and decision support in the management of both acute and chronic disease.
Data Integration Tool: Permafrost Data Debugging

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Pulsifer, P. L.; Strawhacker, C.; Yarmey, L.; Basak, R.

2017-12-01

We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the Global Terrestrial Network-Permafrost (GTN-P). The United States National Science Foundation funded this project through the National Snow and Ice Data Center (NSIDC) with the GTN-P to improve permafrost data access and discovery. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets (https://github.com/PermaData/DIT). Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs, incrementally interact with and evolve the widget workflows, and save those workflows for reproducibility. Taking ideas from visual programming found in the art and design domain, debugging and iterative design principles from software engineering, and the scientific data processing and analysis power of Fortran and Python it was written for interactive, iterative data manipulation, quality control, processing, and analysis of inconsistent data in an easily installable application. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.
Computational Tools for Metabolic Engineering

PubMed Central

Copeland, Wilbert B.; Bartley, Bryan A.; Chandran, Deepak; Galdzicki, Michal; Kim, Kyung H.; Sleight, Sean C.; Maranas, Costas D.; Sauro, Herbert M.

2012-01-01

A great variety of software applications are now employed in the metabolic engineering field. These applications have been created to support a wide range of experimental and analysis techniques. Computational tools are utilized throughout the metabolic engineering workflow to extract and interpret relevant information from large data sets, to present complex models in a more manageable form, and to propose efficient network design strategies. In this review, we present a number of tools that can assist in modifying and understanding cellular metabolic networks. The review covers seven areas of relevance to metabolic engineers. These include metabolic reconstruction efforts, network visualization, nucleic acid and protein engineering, metabolic flux analysis, pathway prospecting, post-structural network analysis and culture optimization. The list of available tools is extensive and we can only highlight a small, representative portion of the tools from each area. PMID:22629572
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API.

PubMed

Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry

2015-01-01

As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST's API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014.
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API

PubMed Central

Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry

2015-01-01

As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST’s API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014. PMID:26672762
Flexible workflow sharing and execution services for e-scientists

NASA Astrophysics Data System (ADS)

Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

2013-04-01

The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
Integrating Visualizations into Modeling NEST Simulations

PubMed Central

Nowke, Christian; Zielasko, Daniel; Weyers, Benjamin; Peyser, Alexander; Hentschel, Bernd; Kuhlen, Torsten W.

2015-01-01

Modeling large-scale spiking neural networks showing realistic biological behavior in their dynamics is a complex and tedious task. Since these networks consist of millions of interconnected neurons, their simulation produces an immense amount of data. In recent years it has become possible to simulate even larger networks. However, solutions to assist researchers in understanding the simulation's complex emergent behavior by means of visualization are still lacking. While developing tools to partially fill this gap, we encountered the challenge to integrate these tools easily into the neuroscientists' daily workflow. To understand what makes this so challenging, we looked into the workflows of our collaborators and analyzed how they use the visualizations to solve their daily problems. We identified two major issues: first, the analysis process can rapidly change focus which requires to switch the visualization tool that assists in the current problem domain. Second, because of the heterogeneous data that results from simulations, researchers want to relate data to investigate these effectively. Since a monolithic application model, processing and visualizing all data modalities and reflecting all combinations of possible workflows in a holistic way, is most likely impossible to develop and to maintain, a software architecture that offers specialized visualization tools that run simultaneously and can be linked together to reflect the current workflow, is a more feasible approach. To this end, we have developed a software architecture that allows neuroscientists to integrate visualization tools more closely into the modeling tasks. In addition, it forms the basis for semantic linking of different visualizations to reflect the current workflow. In this paper, we present this architecture and substantiate the usefulness of our approach by common use cases we encountered in our collaborative work. PMID:26733860

Integrating Clinical Trial Imaging Data Resources Using Service-Oriented Architecture and Grid Computing

PubMed Central

Cladé, Thierry; Snyder, Joshua C.

2010-01-01

Clinical trials which use imaging typically require data management and workflow integration across several parties. We identify opportunities for all parties involved to realize benefits with a modular interoperability model based on service-oriented architecture and grid computing principles. We discuss middleware products for implementation of this model, and propose caGrid as an ideal candidate due to its healthcare focus; free, open source license; and mature developer tools and support. PMID:20449775
XML schemas for common bioinformatic data types and their application in workflow systems.

PubMed

Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

2006-11-06

Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Enabling Efficient Climate Science Workflows in High Performance Computing Environments

NASA Astrophysics Data System (ADS)

Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

2015-12-01

A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.
Workflow computing. Improving management and efficiency of pathology diagnostic services.

PubMed

Buffone, G J; Moreau, D; Beck, J R

1996-04-01

Traditionally, information technology in health care has helped practitioners to collect, store, and present information and also to add a degree of automation to simple tasks (instrument interfaces supporting result entry, for example). Thus commercially available information systems do little to support the need to model, execute, monitor, coordinate, and revise the various complex clinical processes required to support health-care delivery. Workflow computing, which is already implemented and improving the efficiency of operations in several nonmedical industries, can address the need to manage complex clinical processes. Workflow computing not only provides a means to define and manage the events, roles, and information integral to health-care delivery but also supports the explicit implementation of policy or rules appropriate to the process. This article explains how workflow computing may be applied to health-care and the inherent advantages of the technology, and it defines workflow system requirements for use in health-care delivery with special reference to diagnostic pathology.
ImTK: an open source multi-center information management toolkit

NASA Astrophysics Data System (ADS)

Alaoui, Adil; Ingeholm, Mary Lou; Padh, Shilpa; Dorobantu, Mihai; Desai, Mihir; Cleary, Kevin; Mun, Seong K.

2008-03-01

The Information Management Toolkit (ImTK) Consortium is an open source initiative to develop robust, freely available tools related to the information management needs of basic, clinical, and translational research. An open source framework and agile programming methodology can enable distributed software development while an open architecture will encourage interoperability across different environments. The ISIS Center has conceptualized a prototype data sharing network that simulates a multi-center environment based on a federated data access model. This model includes the development of software tools to enable efficient exchange, sharing, management, and analysis of multimedia medical information such as clinical information, images, and bioinformatics data from multiple data sources. The envisioned ImTK data environment will include an open architecture and data model implementation that complies with existing standards such as Digital Imaging and Communications (DICOM), Health Level 7 (HL7), and the technical framework and workflow defined by the Integrating the Healthcare Enterprise (IHE) Information Technology Infrastructure initiative, mainly the Cross Enterprise Document Sharing (XDS) specifications.
Data management and data enrichment for systems biology projects.

PubMed

Wittig, Ulrike; Rey, Maja; Weidemann, Andreas; Müller, Wolfgang

2017-11-10

Collecting, curating, interlinking, and sharing high quality data are central to de.NBI-SysBio, the systems biology data management service center within the de.NBI network (German Network for Bioinformatics Infrastructure). The work of the center is guided by the FAIR principles for scientific data management and stewardship. FAIR stands for the four foundational principles Findability, Accessibility, Interoperability, and Reusability which were established to enhance the ability of machines to automatically find, access, exchange and use data. Within this overview paper we describe three tools (SABIO-RK, Excemplify, SEEK) that exemplify the contribution of de.NBI-SysBio services to FAIR data, models, and experimental methods storage and exchange. The interconnectivity of the tools and the data workflow within systems biology projects will be explained. For many years we are the German partner in the FAIRDOM initiative (http://fair-dom.org) to establish a European data and model management service facility for systems biology. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Lowering the Barriers to Integrative Aquatic Ecosystem Science: Semantic Provenance, Open Linked Data, and Workflows

NASA Astrophysics Data System (ADS)

Harmon, T.; Hofmann, A. F.; Utz, R.; Deelman, E.; Hanson, P. C.; Szekely, P.; Villamizar, S. R.; Knoblock, C.; Guo, Q.; Crichton, D. J.; McCann, M. P.; Gil, Y.

2011-12-01

Environmental cyber-observatory (ECO) planning and implementation has been ongoing for more than a decade now, and several major efforts have recently come online or will soon. Some investigators in the relevant research communities will use ECO data, traditionally by developing their own client-side services to acquire data and then manually create custom tools to integrate and analyze it. However, a significant portion of the aquatic ecosystem science community will need more custom services to manage locally collected data. The latter group represents enormous intellectual capacity when one envisions thousands of ecosystems scientists supplementing ECO baseline data by sharing their own locally intensive observational efforts. This poster summarizes the outcomes of the June 2011 Workshop for Aquatic Ecosystem Sustainability (WAES) which focused on the needs of aquatic ecosystem research on inland waters and oceans. Here we advocate new approaches to support scientists to model, integrate, and analyze data based on: 1) a new breed of software tools in which semantic provenance is automatically created and used by the system, 2) the use of open standards based on RDF and Linked Data Principles to facilitate sharing of data and provenance annotations, 3) the use of workflows to represent explicitly all data preparation, integration, and processing steps in a way that is automatically repeatable. Aquatic ecosystems workflow exemplars are provided and discussed in terms of their potential broaden data sharing, analysis and synthesis thereby increasing the impact of aquatic ecosystem research.
A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

PubMed

Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias

2011-03-21

Bioinformatics visualization tools are often not robust enough to support biomedical specialistsâ€™ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialistâ€™s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users â€œbundles of featuresâ€ consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.
A graph-based computational framework for simulation and optimisation of coupled infrastructure networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jalving, Jordan; Abhyankar, Shrirang; Kim, Kibaek

Here, we present a computational framework that facilitates the construction, instantiation, and analysis of large-scale optimization and simulation applications of coupled energy networks. The framework integrates the optimization modeling package PLASMO and the simulation package DMNetwork (built around PETSc). These tools use a common graphbased abstraction that enables us to achieve compatibility between data structures and to build applications that use network models of different physical fidelity. We also describe how to embed these tools within complex computational workflows using SWIFT, which is a tool that facilitates parallel execution of multiple simulation runs and management of input and output data.more » We discuss how to use these capabilities to target coupled natural gas and electricity systems.« less
A graph-based computational framework for simulation and optimisation of coupled infrastructure networks

DOE PAGES

Jalving, Jordan; Abhyankar, Shrirang; Kim, Kibaek; ...

2017-04-24

Here, we present a computational framework that facilitates the construction, instantiation, and analysis of large-scale optimization and simulation applications of coupled energy networks. The framework integrates the optimization modeling package PLASMO and the simulation package DMNetwork (built around PETSc). These tools use a common graphbased abstraction that enables us to achieve compatibility between data structures and to build applications that use network models of different physical fidelity. We also describe how to embed these tools within complex computational workflows using SWIFT, which is a tool that facilitates parallel execution of multiple simulation runs and management of input and output data.more » We discuss how to use these capabilities to target coupled natural gas and electricity systems.« less
Data and Workflow Management Challenges in Global Adjoint Tomography

NASA Astrophysics Data System (ADS)

Lei, W.; Ruan, Y.; Smith, J. A.; Modrak, R. T.; Orsvuran, R.; Krischer, L.; Chen, Y.; Balasubramanian, V.; Hill, J.; Turilli, M.; Bozdag, E.; Lefebvre, M. P.; Jha, S.; Tromp, J.

2017-12-01

It is crucial to take the complete physics of wave propagation into account in seismic tomography to further improve the resolution of tomographic images. The adjoint method is an efficient way of incorporating 3D wave simulations in seismic tomography. However, global adjoint tomography is computationally expensive, requiring thousands of wavefield simulations and massive data processing. Through our collaboration with the Oak Ridge National Laboratory (ORNL) computing group and an allocation on Titan, ORNL's GPU-accelerated supercomputer, we are now performing our global inversions by assimilating waveform data from over 1,000 earthquakes. The first challenge we encountered is dealing with the sheer amount of seismic data. Data processing based on conventional data formats and processing tools (such as SAC), which are not designed for parallel systems, becomes our major bottleneck. To facilitate the data processing procedures, we designed the Adaptive Seismic Data Format (ASDF) and developed a set of Python-based processing tools to replace legacy FORTRAN-based software. These tools greatly enhance reproducibility and accountability while taking full advantage of highly parallel system and showing superior scaling on modern computational platforms. The second challenge is that the data processing workflow contains more than 10 sub-procedures, making it delicate to handle and prone to human mistakes. To reduce human intervention as much as possible, we are developing a framework specifically designed for seismic inversion based on the state-of-the art workflow management research, specifically the Ensemble Toolkit (EnTK), in collaboration with the RADICAL team from Rutgers University. Using the initial developments of the EnTK, we are able to utilize the full computing power of the data processing cluster RHEA at ORNL while keeping human interaction to a minimum and greatly reducing the data processing time. Thanks to all the improvements, we are now able to perform iterations fast enough on more than a 1,000 earthquakes dataset. Starting from model GLAD-M15 (Bozdag et al., 2016), an elastic 3D model with a transversely isotropic upper mantle, we have successfully performed 5 iterations. Our goal is to finish 10 iterations, i.e., generating GLAD M25* by the end of this year.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Building asynchronous geospatial processing workflows with web services

NASA Astrophysics Data System (ADS)

Zhao, Peisheng; Di, Liping; Yu, Genong

2012-02-01

Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.
A Proof of Concept to Bridge the Gap between Mass Spectrometry Imaging, Protein Identification and Relative Quantitation: MSI~LC-MS/MS-LF.

PubMed

Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe

2016-10-26

Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m / z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation.
A Proof of Concept to Bridge the Gap between Mass Spectrometry Imaging, Protein Identification and Relative Quantitation: MSI~LC-MS/MS-LF

PubMed Central

Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe

2016-01-01

Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m/z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation. PMID:28248242
[Measures to prevent patient identification errors in blood collection/physiological function testing utilizing a laboratory information system].

PubMed

Shimazu, Chisato; Hoshino, Satoshi; Furukawa, Taiji

2013-08-01

We constructed an integrated personal identification workflow chart using both bar code reading and an all in-one laboratory information system. The information system not only handles test data but also the information needed for patient guidance in the laboratory department. The reception terminals at the entrance, displays for patient guidance and patient identification tools at blood-sampling booths are all controlled by the information system. The number of patient identification errors was greatly reduced by the system. However, identification errors have not been abolished in the ultrasound department. After re-evaluation of the patient identification process in this department, we recognized that the major reason for the errors came from excessive identification workflow. Ordinarily, an ultrasound test requires patient identification 3 times, because 3 different systems are required during the entire test process, i.e. ultrasound modality system, laboratory information system and a system for producing reports. We are trying to connect the 3 different systems to develop a one-time identification workflow, but it is not a simple task and has not been completed yet. Utilization of the laboratory information system is effective, but is not yet perfect for patient identification. The most fundamental procedure for patient identification is to ask a person's name even today. Everyday checks in the ordinary workflow and everyone's participation in safety-management activity are important for the prevention of patient identification errors.
Database management systems for process safety.

PubMed

Early, William F

2006-03-17

Several elements of the process safety management regulation (PSM) require tracking and documentation of actions; process hazard analyses, management of change, process safety information, operating procedures, training, contractor safety programs, pre-startup safety reviews, incident investigations, emergency planning, and compliance audits. These elements can result in hundreds of actions annually that require actions. This tracking and documentation commonly is a failing identified in compliance audits, and is difficult to manage through action lists, spreadsheets, or other tools that are comfortably manipulated by plant personnel. This paper discusses the recent implementation of a database management system at a chemical plant and chronicles the improvements accomplished through the introduction of a customized system. The system as implemented modeled the normal plant workflows, and provided simple, recognizable user interfaces for ease of use.
Analysis, Mining and Visualization Service at NCSA

NASA Astrophysics Data System (ADS)

Wilhelmson, R.; Cox, D.; Welge, M.

2004-12-01

NCSA's goal is to create a balanced system that fully supports high-end computing as well as: 1) high-end data management and analysis; 2) visualization of massive, highly complex data collections; 3) large databases; 4) geographically distributed Grid computing; and 5) collaboratories, all based on a secure computational environment and driven with workflow-based services. To this end NCSA has defined a new technology path that includes the integration and provision of cyberservices in support of data analysis, mining, and visualization. NCSA has begun to develop and apply a data mining system-NCSA Data-to-Knowledge (D2K)-in conjunction with both the application and research communities. NCSA D2K will enable the formation of model-based application workflows and visual programming interfaces for rapid data analysis. The Java-based D2K framework, which integrates analytical data mining methods with data management, data transformation, and information visualization tools, will be configurable from the cyberservices (web and grid services, tools, ..) viewpoint to solve a wide range of important data mining problems. This effort will use modules, such as a new classification methods for the detection of high-risk geoscience events, and existing D2K data management, machine learning, and information visualization modules. A D2K cyberservices interface will be developed to seamlessly connect client applications with remote back-end D2K servers, providing computational resources for data mining and integration with local or remote data stores. This work is being coordinated with SDSC's data and services efforts. The new NCSA Visualization embedded workflow environment (NVIEW) will be integrated with D2K functionality to tightly couple informatics and scientific visualization with the data analysis and management services. Visualization services will access and filter disparate data sources, simplifying tasks such as fusing related data from distinct sources into a coherent visual representation. This approach enables collaboration among geographically dispersed researchers via portals and front-end clients, and the coupling with data management services enables recording associations among datasets and building annotation systems into visualization tools and portals, giving scientists a persistent, shareable, virtual lab notebook. To facilitate provision of these cyberservices to the national community, NCSA will be providing a computational environment for large-scale data assimilation, analysis, mining, and visualization. This will be initially implemented on the new 512 processor shared memory SGI's recently purchased by NCSA. In addition to standard batch capabilities, NCSA will provide on-demand capabilities for those projects requiring rapid response (e.g., development of severe weather, earthquake events) for decision makers. It will also be used for non-sequential interactive analysis of data sets where it is important have access to large data volumes over space and time.
RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

PubMed Central

Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

2016-01-01

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971
RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS.

PubMed

Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

2017-01-01

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.

Managing Written Directives: A Software Solution to Streamline Workflow.

PubMed

Wagner, Robert H; Savir-Baruch, Bital; Gabriel, Medhat S; Halama, James R; Bova, Davide

2017-06-01

A written directive is required by the U.S. Nuclear Regulatory Commission for any use of 131 I above 1.11 MBq (30 μCi) and for patients receiving radiopharmaceutical therapy. This requirement has also been adopted and must be enforced by the agreement states. As the introduction of new radiopharmaceuticals increases therapeutic options in nuclear medicine, time spent on regulatory paperwork also increases. The pressure of managing these time-consuming regulatory requirements may heighten the potential for inaccurate or incomplete directive data and subsequent regulatory violations. To improve on the paper-trail method of directive management, we created a software tool using a Health Insurance Portability and Accountability Act (HIPAA)-compliant database. This software allows for secure data-sharing among physicians, technologists, and managers while saving time, reducing errors, and eliminating the possibility of loss and duplication. Methods: The software tool was developed using Visual Basic, which is part of the Visual Studio development environment for the Windows platform. Patient data are deposited in an Access database on a local HIPAA-compliant secure server or hard disk. Once a working version had been developed, it was installed at our institution and used to manage directives. Updates and modifications of the software were released regularly until no more significant problems were found with its operation. Results: The software has been used at our institution for over 2 y and has reliably kept track of all directives. All physicians and technologists use the software daily and find it superior to paper directives. They can retrieve active directives at any stage of completion, as well as completed directives. Conclusion: We have developed a software solution for the management of written directives that streamlines and structures the departmental workflow. This solution saves time, centralizes the information for all staff to share, and decreases confusion about the creation, completion, filing, and retrieval of directives. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
Data Management and Archiving - a Long Process

NASA Astrophysics Data System (ADS)

Gebauer, Petra; Bertelmann, Roland; Hasler, Tim; Kirchner, Ingo; Klump, Jens; Mettig, Nora; Peters-Kottig, Wolfgang; Rusch, Beate; Ulbricht, Damian

2014-05-01

Implementing policies for research data management to the end of data archiving at university institutions takes a long time. Even though, especially in geosciences, most of the scientists are familiar to analyze different sorts of data, to present statistical results and to write publications sometimes based on big data records, only some of them manage their data in a standardized manner. Much more often they have learned how to measure and to generate large volumes of data than to document these measurements and to preserve them for the future. Changing staff and limited funding make this work more difficult, but it is essential in a progressively developing digital and networked world. Results from the project EWIG (Translates to: Developing workflow components for long-term archiving of research data in geosciences), funded by Deutsche Forschungsgemeinschaft, will help on these theme. Together with the project partners Deutsches GeoForschungsZentrum Potsdam and Konrad-Zuse-Zentrum für Informationstechnik Berlin a workflow to transfer continuously recorded data from a meteorological city monitoring network into a long-term archive was developed. This workflow includes quality assurance of the data as well as description of metadata and using tools to prepare data packages for long term archiving. It will be an exemplary model for other institutions working with similar data. The development of this workflow is closely intertwined with the educational curriculum at the Institut für Meteorologie. Designing modules to run quality checks for meteorological time series of data measured every minute and preparing metadata are tasks in actual bachelor theses. Students will also test the usability of the generated working environment. Based on these experiences a practical guideline for integrating research data management in curricula will be one of the results of this project, for postgraduates as well as for younger students. Especially at the beginning of the scientific career it is necessary to become familiar with all issues concerning data management. The outcomes of EWIG are intended to be generic enough to be easily adopted by other institutions. University lectures in meteorology were started to teach future scientific generations right from the start how to deal with all sorts of different data in a transparent way. The progress of the project EWIG can be followed on the web via ewig.gfz-potsdam.de
Server-based enterprise collaboration software improves safety and quality in high-volume PET/CT practice.

PubMed

McDonald, James E; Kessler, Marcus M; Hightower, Jeremy L; Henry, Susan D; Deloney, Linda A

2013-12-01

With increasing volumes of complex imaging cases and rising economic pressure on physician staffing, timely reporting will become progressively challenging. Current and planned iterations of PACS and electronic medical record systems do not offer workflow management tools to coordinate delivery of imaging interpretations with the needs of the patient and ordering physician. The adoption of a server-based enterprise collaboration software system by our Division of Nuclear Medicine has significantly improved our efficiency and quality of service.
Workflows for microarray data processing in the Kepler environment.

PubMed

Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

2012-05-17

Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
SU-F-T-251: The Quality Assurance for the Heavy Patient Load Department in the Developing Country: The Primary Experience of An Entire Workflow QA Process Management in Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, J; Wang, J; Peng, J

Purpose: To implement an entire workflow quality assurance (QA) process in the radiotherapy department and to reduce the error rates of radiotherapy based on the entire workflow management in the developing country. Methods: The entire workflow QA process management starts from patient registration to the end of last treatment including all steps through the entire radiotherapy process. Error rate of chartcheck is used to evaluate the the entire workflow QA process. Two to three qualified senior medical physicists checked the documents before the first treatment fraction of every patient. Random check of the treatment history during treatment was also performed.more » A total of around 6000 patients treatment data before and after implementing the entire workflow QA process were compared from May, 2014 to December, 2015. Results: A systemic checklist was established. It mainly includes patient’s registration, treatment plan QA, information exporting to OIS(Oncology Information System), documents of treatment QAand QA of the treatment history. The error rate derived from the chart check decreases from 1.7% to 0.9% after our the entire workflow QA process. All checked errors before the first treatment fraction were corrected as soon as oncologist re-confirmed them and reinforce staff training was accordingly followed to prevent those errors. Conclusion: The entire workflow QA process improved the safety, quality of radiotherapy in our department and we consider that our QA experience can be applicable for the heavily-loaded radiotherapy departments in developing country.« less
Empowering file-based radio production through media asset management systems

NASA Astrophysics Data System (ADS)

Muylaert, Bjorn; Beckers, Tom

2006-10-01

In recent years, IT-based production and archiving of media has matured to a level which enables broadcasters to switch over from tape- or CD-based to file-based workflows for the production of their radio and television programs. This technology is essential for the future of broadcasters as it provides the flexibility and speed of execution the customer demands by enabling, among others, concurrent access and production, faster than real-time ingest, edit during ingest, centrally managed annotation and quality preservation of media. In terms of automation of program production, the radio department is the most advanced within the VRT, the Flemish broadcaster. Since a couple of years ago, the radio department has been working with digital equipment and producing its programs mainly on standard IT equipment. Historically, the shift from analogue to digital based production has been a step by step process initiated and coordinated by each radio station separately, resulting in a multitude of tools and metadata collections, some of them developed in-house, lacking integration. To make matters worse, each of those stations adopted a slightly different production methodology. The planned introduction of a company-wide Media Asset Management System allows a coordinated overhaul to a unified production architecture. Benefits include the centralized ingest and annotation of audio material and the uniform, integrated (in terms of IT infrastructure) workflow model. Needless to say, the ingest strategy, metadata management and integration with radio production systems play a major role in the level of success of any improvement effort. This paper presents a data model for audio-specific concepts relevant to radio production. It includes an investigation of ingest techniques and strategies. Cooperation with external, professional production tools is demonstrated through a use-case scenario: the integration of an existing, multi-track editing tool with a commercially available Media Asset Management System. This will enable an uncomplicated production chain, with a recognizable look and feel for all system users, regardless of their affiliated radio station, as well as central retrieval and storage of information and metadata.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.

PubMed

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2014-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Ergonomic design for dental offices.

PubMed

Ahearn, David J; Sanders, Martha J; Turcotte, Claudia

2010-01-01

The increasing complexity of the dental office environment influences productivity and workflow for dental clinicians. Advances in technology, and with it the range of products needed to provide services, have led to sprawl in operatory setups and the potential for awkward postures for dental clinicians during the delivery of oral health services. Although ergonomics often addresses the prevention of musculoskeletal disorders for specific populations of workers, concepts of workflow and productivity are integral to improved practice in work environments. This article provides suggestions for improving workflow and productivity for dental clinicians. The article applies ergonomic principles to dental practice issues such as equipment and supply management, office design, and workflow management. Implications for improved ergonomic processes and future research are explored.
Driving external chemistry optimization via operations management principles.

PubMed

Bi, F Christopher; Frost, Heather N; Ling, Xiaolan; Perry, David A; Sakata, Sylvie K; Bailey, Simon; Fobian, Yvette M; Sloan, Leslie; Wood, Anthony

2014-03-01

Confronted with the need to significantly raise the productivity of remotely located chemistry CROs Pfizer embraced a commitment to continuous improvement which leveraged the tools from both Lean Six Sigma and queue management theory to deliver positive measurable outcomes. During 2012 cycle times were reduced by 48% by optimization of the work in progress and conducting a detailed workflow analysis to identify and address pinch points. Compound flow was increased by 29% by optimizing the request process and de-risking the chemistry. Underpinning both achievements was the development of close working relationships and productive communications between Pfizer and CRO chemists. Copyright © 2013 Elsevier Ltd. All rights reserved.
[What Surgeons Should Know about Risk Management].

PubMed

Strametz, R; Tannheimer, M; Rall, M

2017-02-01

Background: The fact that medical treatment is associated with errors has long been recognized. Based on the principle of "first do no harm", numerous efforts have since been made to prevent such errors or limit their impact. However, recent statistics show that these measures do not sufficiently prevent grave mistakes with serious consequences. Preventable mistakes such as wrong patient or wrong site surgery still frequently occur in error statistics. Methods: Based on insight from research on human error, in due consideration of recent legislative regulations in Germany, the authors give an overview of the clinical risk management tools needed to identify risks in surgery, analyse their causes, and determine adequate measures to manage those risks depending on their relevance. The use and limitations of critical incident reporting systems (CIRS), safety checklists and crisis resource management (CRM) are highlighted. Also the rationale for IT systems to support the risk management process is addressed. Results/Conclusion: No single tool of risk management can be effective as a standalone instrument, but unfolds its effect only when embedded in a superordinate risk management system, which integrates tailor-made elements to increase patient safety into the workflows of each organisation. Competence in choosing adequate tools, effective IT systems to support the risk management process as well as leadership and commitment to constructive handling of human error are crucial components to establish a safety culture in surgery. Georg Thieme Verlag KG Stuttgart · New York.
A recommended workflow methodology in the creation of an educational and training application incorporating a digital reconstruction of the cerebral ventricular system and cerebrospinal fluid circulation to aid anatomical understanding.

PubMed

Manson, Amy; Poyade, Matthieu; Rea, Paul

2015-10-19

The use of computer-aided learning in education can be advantageous, especially when interactive three-dimensional (3D) models are used to aid learning of complex 3D structures. The anatomy of the ventricular system of the brain is difficult to fully understand as it is seldom seen in 3D, as is the flow of cerebrospinal fluid (CSF). This article outlines a workflow for the creation of an interactive training tool for the cerebral ventricular system, an educationally challenging area of anatomy. This outline is based on the use of widely available computer software packages. Using MR images of the cerebral ventricular system and several widely available commercial and free software packages, the techniques of 3D modelling, texturing, sculpting, image editing and animations were combined to create a workflow in the creation of an interactive educational and training tool. This was focussed on cerebral ventricular system anatomy, and the flow of cerebrospinal fluid. We have successfully created a robust methodology by using key software packages in the creation of an interactive education and training tool. This has resulted in an application being developed which details the anatomy of the ventricular system, and flow of cerebrospinal fluid using an anatomically accurate 3D model. In addition to this, our established workflow pattern presented here also shows how tutorials, animations and self-assessment tools can also be embedded into the training application. Through our creation of an established workflow in the generation of educational and training material for demonstrating cerebral ventricular anatomy and flow of cerebrospinal fluid, it has enormous potential to be adopted into student training in this field. With the digital age advancing rapidly, this has the potential to be used as an innovative tool alongside other methodologies for the training of future healthcare practitioners and scientists. This workflow could be used in the creation of other tools, which could be developed for use not only on desktop and laptop computers but also smartphones, tablets and fully immersive stereoscopic environments. It also could form the basis on which to build surgical simulations enhanced with haptic interaction.
Improved cyberinfrastructure for integrated hydrometeorological predictions within the fully-coupled WRF-Hydro modeling system

NASA Astrophysics Data System (ADS)

gochis, David; hooper, Rick; parodi, Antonio; Jha, Shantenu; Yu, Wei; Zaslavsky, Ilya; Ganapati, Dinesh

2014-05-01

The community WRF-Hydro system is currently being used in a variety of flood prediction and regional hydroclimate impacts assessment applications around the world. Despite its increasingly wide use certain cyberinfrastructure bottlenecks exist in the setup, execution and post-processing of WRF-Hydro model runs. These bottlenecks result in wasted time, labor, data transfer bandwidth and computational resource use. Appropriate development and use of cyberinfrastructure to setup and manage WRF-Hydro modeling applications will streamline the entire workflow of hydrologic model predictions. This talk will present recent advances in the development and use of new open-source cyberinfrastructure tools for the WRF-Hydro architecture. These tools include new web-accessible pre-processing applications, supercomputer job management applications and automated verification and visualization applications. The tools will be described successively and then demonstrated in a set of flash flood use cases for recent destructive flood events in the U.S. and in Europe. Throughout, an emphasis on the implementation and use of community data standards for data exchange is made.
The coffee genome hub: a resource for coffee genomes

PubMed Central

Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan

2015-01-01

The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
Detecting distant homologies on protozoans metabolic pathways using scientific workflows.

PubMed

da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta

2010-01-01

Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
AstroGrid: Taverna in the Virtual Observatory .

NASA Astrophysics Data System (ADS)

Benson, K. M.; Walton, N. A.

This paper reports on the implementation of the Taverna workbench by AstroGrid, a tool for designing and executing workflows of tasks in the Virtual Observatory. The workflow approach helps astronomers perform complex task sequences with little technical effort. Visual approach to workflow construction streamlines highly complex analysis over public and private data and uses computational resources as minimal as a desktop computer. Some integration issues and future work are discussed in this article.
Toward server-side, high performance climate change data analytics in the Earth System Grid Federation (ESGF) eco-system

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Williams, Dean; Aloisio, Giovanni

2016-04-01

In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated (e.g., the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). Most of the tools currently available for scientific data analysis in the climate domain fail at large scale since they: (1) are desktop based and need the data locally; (2) are sequential, so do not benefit from available multicore/parallel machines; (3) do not provide declarative languages to express scientific data analysis tasks; (4) are domain-specific, which ties their adoption to a specific domain; and (5) do not provide a workflow support, to enable the definition of complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes ("datacubes"). The project relies on a strong background of high performance database management and OLAP systems to manage large scientific data sets. It also provides a native workflow management support, to define processing chains and workflows with tens to hundreds of data analytics operators to build real scientific use cases. With regard to interoperability aspects, the talk will present the contribution provided both to the RDA Working Group on Array Databases, and the Earth System Grid Federation (ESGF) Compute Working Team. Also highlighted will be the results of large scale climate model intercomparison data analysis experiments, for example: (1) defined in the context of the EU H2020 INDIGO-DataCloud project; (2) implemented in a real geographically distributed environment involving CMCC (Italy) and LLNL (US) sites; (3) exploiting Ophidia as server-side, parallel analytics engine; and (4) applied on real CMIP5 data sets available through ESGF.
User Requirements for a Chronic Kidney Disease Clinical Decision Support Tool to Promote Timely Referral.

PubMed

Gulla, Joy; Neri, Pamela M; Bates, David W; Samal, Lipika

2017-05-01

Timely referral of patients with CKD has been associated with cost and mortality benefits, but referrals are often done too late in the course of the disease. Clinical decision support (CDS) offers a potential solution, but interventions have failed because they were not designed to support the physician workflow. We sought to identify user requirements for a chronic kidney disease (CKD) CDS system to promote timely referral. We interviewed primary care physicians (PCPs) to identify data needs for a CKD CDS system that would encourage timely referral and also gathered information about workflow to assess risk factors for progression of CKD. Interviewees were general internists recruited from a network of 14 primary care clinics affiliated with Brigham and Women's Hospital (BWH). We then performed a qualitative analysis to identify user requirements and system attributes for a CKD CDS system. Of the 12 participants, 25% were women, the mean age was 53 (range 37-82), mean years in clinical practice was 27 (range 11-58). We identified 21 user requirements. Seven of these user requirements were related to support for the referral process workflow, including access to pertinent information and support for longitudinal co-management. Six user requirements were relevant to PCP management of CKD, including management of risk factors for progression, interpretation of biomarkers of CKD severity, and diagnosis of the cause of CKD. Finally, eight user requirements addressed user-centered design of CDS, including the need for actionable information, links to guidelines and reference materials, and visualization of trends. These 21 user requirements can be used to design an intuitive and usable CDS system with the attributes necessary to promote timely referral. Copyright © 2017 Elsevier B.V. All rights reserved.
Systems engineering implementation in the preliminary design phase of the Giant Magellan Telescope

NASA Astrophysics Data System (ADS)

Maiten, J.; Johns, M.; Trancho, G.; Sawyer, D.; Mady, P.

2012-09-01

Like many telescope projects today, the 24.5-meter Giant Magellan Telescope (GMT) is truly a complex system. The primary and secondary mirrors of the GMT are segmented and actuated to support two operating modes: natural seeing and adaptive optics. GMT is a general-purpose telescope supporting multiple science instruments operated in those modes. GMT is a large, diverse collaboration and development includes geographically distributed teams. The need to implement good systems engineering processes for managing the development of systems like GMT becomes imperative. The management of the requirements flow down from the science requirements to the component level requirements is an inherently difficult task in itself. The interfaces must also be negotiated so that the interactions between subsystems and assemblies are well defined and controlled. This paper will provide an overview of the systems engineering processes and tools implemented for the GMT project during the preliminary design phase. This will include requirements management, documentation and configuration control, interface development and technical risk management. Because of the complexity of the GMT system and the distributed team, using web-accessible tools for collaboration is vital. To accomplish this GMTO has selected three tools: Cognition Cockpit, Xerox Docushare, and Solidworks Enterprise Product Data Management (EPDM). Key to this is the use of Cockpit for managing and documenting the product tree, architecture, error budget, requirements, interfaces, and risks. Additionally, drawing management is accomplished using an EPDM vault. Docushare, a documentation and configuration management tool is used to manage workflow of documents and drawings for the GMT project. These tools electronically facilitate collaboration in real time, enabling the GMT team to track, trace and report on key project metrics and design parameters.
SynTrack: DNA Assembly Workflow Management (SynTrack) v2.0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

MENG, XIANWEI; SIMIRENKO, LISA

2016-12-01

SynTrack is a dynamic, workflow-driven data management system that tracks the DNA build process: Management of the hierarchical relationships of the DNA fragments; Monitoring of process tasks for the assembly of multiple DNA fragments into final constructs; Creations of vendor order forms with selectable building blocks. Organizing plate layouts barcodes for vendor/pcr/fusion/chewback/bioassay/glycerol/master plate maps (default/condensed); Creating or updating Pre-Assembly/Assembly process workflows with selected building blocks; Generating Echo pooling instructions based on plate maps; Tracking of building block orders, received and final assembled for delivering; Bulk updating of colony or PCR amplification information, fusion PCR and chewback results; Updating with QA/QCmore » outcome with .csv & .xlsx template files; Re-work assembly workflow enabled before and after sequencing validation; and Tracking of plate/well data changes and status updates and reporting of master plate status with QC outcomes.« less
MetaNET--a web-accessible interactive platform for biological metabolic network analysis.

PubMed

Narang, Pankaj; Khan, Shawez; Hemrom, Anmol Jaywant; Lynn, Andrew Michael

2014-01-01

Metabolic reactions have been extensively studied and compiled over the last century. These have provided a theoretical base to implement models, simulations of which are used to identify drug targets and optimize metabolic throughput at a systemic level. While tools for the perturbation of metabolic networks are available, their applications are limited and restricted as they require varied dependencies and often a commercial platform for full functionality. We have developed MetaNET, an open source user-friendly platform-independent and web-accessible resource consisting of several pre-defined workflows for metabolic network analysis. MetaNET is a web-accessible platform that incorporates a range of functions which can be combined to produce different simulations related to metabolic networks. These include (i) optimization of an objective function for wild type strain, gene/catalyst/reaction knock-out/knock-down analysis using flux balance analysis. (ii) flux variability analysis (iii) chemical species participation (iv) cycles and extreme paths identification and (v) choke point reaction analysis to facilitate identification of potential drug targets. The platform is built using custom scripts along with the open-source Galaxy workflow and Systems Biology Research Tool as components. Pre-defined workflows are available for common processes, and an exhaustive list of over 50 functions are provided for user defined workflows. MetaNET, available at http://metanet.osdd.net , provides a user-friendly rich interface allowing the analysis of genome-scale metabolic networks under various genetic and environmental conditions. The framework permits the storage of previous results, the ability to repeat analysis and share results with other users over the internet as well as run different tools simultaneously using pre-defined workflows, and user-created custom workflows.

RetroPath2.0: A retrosynthesis workflow for metabolic engineers.

PubMed

Delépine, Baudoin; Duigou, Thomas; Carbonell, Pablo; Faulon, Jean-Loup

2018-01-01

Synthetic biology applied to industrial biotechnology is transforming the way we produce chemicals. However, despite advances in the scale and scope of metabolic engineering, the research and development process still remains costly. In order to expand the chemical repertoire for the production of next generation compounds, a major engineering biology effort is required in the development of novel design tools that target chemical diversity through rapid and predictable protocols. Addressing that goal involves retrosynthesis approaches that explore the chemical biosynthetic space. However, the complexity associated with the large combinatorial retrosynthesis design space has often been recognized as the main challenge hindering the approach. Here, we provide RetroPath2.0, an automated open source workflow for retrosynthesis based on generalized reaction rules that perform the retrosynthesis search from chassis to target through an efficient and well-controlled protocol. Its easiness of use and the versatility of its applications make this tool a valuable addition to the biological engineer bench desk. We show through several examples the application of the workflow to biotechnological relevant problems, including the identification of alternative biosynthetic routes through enzyme promiscuity or the development of biosensors. We demonstrate in that way the ability of the workflow to streamline retrosynthesis pathway design and its major role in reshaping the design, build, test and learn pipeline by driving the process toward the objective of optimizing bioproduction. The RetroPath2.0 workflow is built using tools developed by the bioinformatics and cheminformatics community, because it is open source we anticipate community contributions will likely expand further the features of the workflow. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Echo™ User Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harvey, Dustin Yewell

Echo™ is a MATLAB-based software package designed for robust and scalable analysis of complex data workflows. An alternative to tedious, error-prone conventional processes, Echo is based on three transformative principles for data analysis: self-describing data, name-based indexing, and dynamic resource allocation. The software takes an object-oriented approach to data analysis, intimately connecting measurement data with associated metadata. Echo operations in an analysis workflow automatically track and merge metadata and computation parameters to provide a complete history of the process used to generate final results, while automated figure and report generation tools eliminate the potential to mislabel those results. History reportingmore » and visualization methods provide straightforward auditability of analysis processes. Furthermore, name-based indexing on metadata greatly improves code readability for analyst collaboration and reduces opportunities for errors to occur. Echo efficiently manages large data sets using a framework that seamlessly allocates resources such that only the necessary computations to produce a given result are executed. Echo provides a versatile and extensible framework, allowing advanced users to add their own tools and data classes tailored to their own specific needs. Applying these transformative principles and powerful features, Echo greatly improves analyst efficiency and quality of results in many application areas.« less
Guideline validation in multiple trauma care through business process modeling.

PubMed

Stausberg, Jürgen; Bilir, Hüseyin; Waydhas, Christian; Ruchholtz, Steffen

2003-07-01

Clinical guidelines can improve the quality of care in multiple trauma. In our Department of Trauma Surgery a specific guideline is available paper-based as a set of flowcharts. This format is appropriate for the use by experienced physicians but insufficient for electronic support of learning, workflow and process optimization. A formal and logically consistent version represented with a standardized meta-model is necessary for automatic processing. In our project we transferred the paper-based into an electronic format and analyzed the structure with respect to formal errors. Several errors were detected in seven error categories. The errors were corrected to reach a formally and logically consistent process model. In a second step the clinical content of the guideline was revised interactively using a process-modeling tool. Our study reveals that guideline development should be assisted by process modeling tools, which check the content in comparison to a meta-model. The meta-model itself could support the domain experts in formulating their knowledge systematically. To assure sustainability of guideline development a representation independent of specific applications or specific provider is necessary. Then, clinical guidelines could be used for eLearning, process optimization and workflow management additionally.
Public Health Surveillance via Template Management in Electronic Health Records: Tri-Service Workflow's Rapid Response to an Infectious Disease Crisis.

PubMed

Berkley, Holly; Barnes, Matthew; Carnahan, David; Hayhurst, Janet; Bockhorst, Archie; Neville, James

2017-03-01

To describe the use of template-based screening for risk of infectious disease exposure of patients presenting to primary care medical facilities during the 2014 West African Ebola virus outbreak. The Military Health System implemented an Ebola risk-screening tool in primary care settings in order to create early notifications and early responses to potentially infected persons. Three time-sensitive, evidence-based screening questions were developed and posted to Tri-Service Workflow (TSWF) AHLTA templates in conjunction with appropriate training. Data were collected in January 2015, to assess the adoption of the TSWF-based Ebola risk-screening tool. Among encounters documented using TSWF templates, 41% of all encounters showed use of the TSWF-based Ebola risk-screening questions by the fourth day. The screening rate increased over the next 3 weeks, and reached a plateau at approximately 50%. This report demonstrates the MHS capability to deploy a standardized, globally applicable decision support aid that could be seen the same day by all primary care clinics across the military health direct care system, potentially improving rapid compliance with screening directives. Reprint & Copyright © 2017 Association of Military Surgeons of the U.S.
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

NASA Astrophysics Data System (ADS)

Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

2015-12-01

The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

NASA Astrophysics Data System (ADS)

Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.

Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Dawn: A Simulation Model for Evaluating Costs and Tradeoffs of Big Data Science Architectures

NASA Astrophysics Data System (ADS)

Cinquini, L.; Crichton, D. J.; Braverman, A. J.; Kyo, L.; Fuchs, T.; Turmon, M.

2014-12-01

In many scientific disciplines, scientists and data managers are bracing for an upcoming deluge of big data volumes, which will increase the size of current data archives by a factor of 10-100 times. For example, the next Climate Model Inter-comparison Project (CMIP6) will generate a global archive of model output of approximately 10-20 Peta-bytes, while the upcoming next generation of NASA decadal Earth Observing instruments are expected to collect tens of Giga-bytes/day. In radio-astronomy, the Square Kilometre Array (SKA) will collect data in the Exa-bytes/day range, of which (after reduction and processing) around 1.5 Exa-bytes/year will be stored. The effective and timely processing of these enormous data streams will require the design of new data reduction and processing algorithms, new system architectures, and new techniques for evaluating computation uncertainty. Yet at present no general software tool or framework exists that will allow system architects to model their expected data processing workflow, and determine the network, computational and storage resources needed to prepare their data for scientific analysis. In order to fill this gap, at NASA/JPL we have been developing a preliminary model named DAWN (Distributed Analytics, Workflows and Numerics) for simulating arbitrary complex workflows composed of any number of data processing and movement tasks. The model can be configured with a representation of the problem at hand (the data volumes, the processing algorithms, the available computing and network resources), and is able to evaluate tradeoffs between different possible workflows based on several estimators: overall elapsed time, separate computation and transfer times, resulting uncertainty, and others. So far, we have been applying DAWN to analyze architectural solutions for 4 different use cases from distinct science disciplines: climate science, astronomy, hydrology and a generic cloud computing use case. This talk will present preliminary results and discuss how DAWN can be evolved into a powerful tool for designing system architectures for data intensive science.
RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service

NASA Astrophysics Data System (ADS)

Yang, Chao; Chen, Nengcheng; Di, Liping

2012-10-01

Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms

PubMed Central

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2017-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework

PubMed Central

Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A.; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E.; Hervey, W. Judson; McGowan, Thomas; Muth, Thilo; Rudney, Joel; Griffin, Timothy J.

2018-01-01

The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest“ undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software. PMID:29385081
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework.

PubMed

Blank, Clemens; Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E; Hervey, W Judson; McGowan, Thomas; Muth, Thilo; Nunn, Brook; Rudney, Joel; Tanca, Alessandro; Griffin, Timothy J; Jagtap, Pratik D

2018-01-31

The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
A Two-Stage Probabilistic Approach to Manage Personal Worklist in Workflow Management Systems

NASA Astrophysics Data System (ADS)

Han, Rui; Liu, Yingbo; Wen, Lijie; Wang, Jianmin

The application of workflow scheduling in managing individual actor's personal worklist is one area that can bring great improvement to business process. However, current deterministic work cannot adapt to the dynamics and uncertainties in the management of personal worklist. For such an issue, this paper proposes a two-stage probabilistic approach which aims at assisting actors to flexibly manage their personal worklists. To be specific, the approach analyzes every activity instance's continuous probability of satisfying deadline at the first stage. Based on this stochastic analysis result, at the second stage, an innovative scheduling strategy is proposed to minimize the overall deadline violation cost for an actor's personal worklist. Simultaneously, the strategy recommends the actor a feasible worklist of activity instances which meet the required bottom line of successful execution. The effectiveness of our approach is evaluated in a real-world workflow management system and with large scale simulation experiments.
Development of a web-based toolkit to support improvement of care coordination in primary care.

PubMed

Ganz, David A; Barnard, Jenny M; Smith, Nina Z Y; Miake-Lye, Isomi M; Delevan, Deborah M; Simon, Alissa; Rose, Danielle E; Stockdale, Susan E; Chang, Evelyn T; Noël, Polly H; Finley, Erin P; Lee, Martin L; Zulman, Donna M; Cordasco, Kristina M; Rubenstein, Lisa V

2018-05-23

Promising practices for the coordination of chronic care exist, but how to select and share these practices to support quality improvement within a healthcare system is uncertain. This study describes an approach for selecting high-quality tools for an online care coordination toolkit to be used in Veterans Health Administration (VA) primary care practices. We evaluated tools in three steps: (1) an initial screening to identify tools relevant to care coordination in VA primary care, (2) a two-clinician expert review process assessing tool characteristics (e.g. frequency of problem addressed, linkage to patients' experience of care, effect on practice workflow, and sustainability with existing resources) and assigning each tool a summary rating, and (3) semi-structured interviews with VA patients and frontline clinicians and staff. Of 300 potentially relevant tools identified by searching online resources, 65, 38, and 18 remained after steps one, two and three, respectively. The 18 tools cover five topics: managing referrals to specialty care, medication management, patient after-visit summary, patient activation materials, agenda setting, patient pre-visit packet, and provider contact information for patients. The final toolkit provides access to the 18 tools, as well as detailed information about tools' expected benefits, and resources required for tool implementation. Future care coordination efforts can benefit from systematically reviewing available tools to identify those that are high quality and relevant.
Cloud-Based Tools to Support High-Resolution Modeling (Invited)

NASA Astrophysics Data System (ADS)

Jones, N.; Nelson, J.; Swain, N.; Christensen, S.

2013-12-01

The majority of watershed models developed to support decision-making by water management agencies are simple, lumped-parameter models. Maturity in research codes and advances in the computational power from multi-core processors on desktop machines, commercial cloud-computing resources, and supercomputers with thousands of cores have created new opportunities for employing more accurate, high-resolution distributed models for routine use in decision support. The barriers for using such models on a more routine basis include massive amounts of spatial data that must be processed for each new scenario and lack of efficient visualization tools. In this presentation we will review a current NSF-funded project called CI-WATER that is intended to overcome many of these roadblocks associated with high-resolution modeling. We are developing a suite of tools that will make it possible to deploy customized web-based apps for running custom scenarios for high-resolution models with minimal effort. These tools are based on a software stack that includes 52 North, MapServer, PostGIS, HT Condor, CKAN, and Python. This open source stack provides a simple scripting environment for quickly configuring new custom applications for running high-resolution models as geoprocessing workflows. The HT Condor component facilitates simple access to local distributed computers or commercial cloud resources when necessary for stochastic simulations. The CKAN framework provides a powerful suite of tools for hosting such workflows in a web-based environment that includes visualization tools and storage of model simulations in a database to archival, querying, and sharing of model results. Prototype applications including land use change, snow melt, and burned area analysis will be presented. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Overcoming Barriers to Technology Adoption in Small Manufacturing Enterprises (SMEs)

DTIC Science & Technology

2003-06-01

automates quote-generation, order - processing workflow management, perform- ance analysis, and accounting functions. Ultimately, it will enable Magdic...that Magdic imple- ment an MES instead. The MES, in addition to solving the problem of document manage- ment, would automate quote-generation, order ... processing , workflow management, perform- ance analysis, and accounting functions. To help Magdic personnel learn about the MES, TIDE personnel provided
Iterative Development of an Application to Support Nuclear Magnetic Resonance Data Analysis of Proteins.

PubMed

Ellis, Heidi J C; Nowling, Ronald J; Vyas, Jay; Martyn, Timothy O; Gryk, Michael R

2011-04-11

The CONNecticut Joint University Research (CONNJUR) team is a group of biochemical and software engineering researchers at multiple institutions. The vision of the team is to develop a comprehensive application that integrates a variety of existing analysis tools with workflow and data management to support the process of protein structure determination using Nuclear Magnetic Resonance (NMR). The use of multiple disparate tools and lack of data management, currently the norm in NMR data processing, provides strong motivation for such an integrated environment. This manuscript briefly describes the domain of NMR as used for protein structure determination and explains the formation of the CONNJUR team and its operation in developing the CONNJUR application. The manuscript also describes the evolution of the CONNJUR application through four prototypes and describes the challenges faced while developing the CONNJUR application and how those challenges were met.
Development of a novel imaging informatics-based system with an intelligent workflow engine (IWEIS) to support imaging-based clinical trials

PubMed Central

Wang, Ximing; Liu, Brent J; Martinez, Clarisa; Zhang, Xuejun; Winstein, Carolee J

2015-01-01

Imaging based clinical trials can benefit from a solution to efficiently collect, analyze, and distribute multimedia data at various stages within the workflow. Currently, the data management needs of these trials are typically addressed with custom-built systems. However, software development of the custom- built systems for versatile workflows can be resource-consuming. To address these challenges, we present a system with a workflow engine for imaging based clinical trials. The system enables a project coordinator to build a data collection and management system specifically related to study protocol workflow without programming. Web Access to DICOM Objects (WADO) module with novel features is integrated to further facilitate imaging related study. The system was initially evaluated by an imaging based rehabilitation clinical trial. The evaluation shows that the cost of the development of system can be much reduced compared to the custom-built system. By providing a solution to customize a system and automate the workflow, the system will save on development time and reduce errors especially for imaging clinical trials. PMID:25870169
How to Take HRMS Process Management to the Next Level with Workflow Business Event System

NASA Technical Reports Server (NTRS)

Rajeshuni, Sarala; Yagubian, Aram; Kunamaneni, Krishna

2006-01-01

Oracle Workflow with the Business Event System offers a complete process management solution for enterprises to manage business processes cost-effectively. Using Workflow event messaging, event subscriptions, AQ Servlet and advanced queuing technologies, this presentation will demonstrate the step-by-step design and implementation of system solutions in order to integrate two dissimilar systems and establish communication remotely. As a case study, the presentation walks you through the process of propagating organization name changes in other applications that originated from the HRMS module without changing applications code. The solution can be applied to your particular business cases for streamlining or modifying business processes across Oracle and non-Oracle applications.
MO-B-BRB-03: Systems Engineering Tools for Treatment Planning Process Optimization in Radiation Medicine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kapur, A.

The radiotherapy treatment planning process has evolved over the years with innovations in treatment planning, treatment delivery and imaging systems. Treatment modality and simulation technologies are also rapidly improving and affecting the planning process. For example, Image-guided-radiation-therapy has been widely adopted for patient setup, leading to margin reduction and isocenter repositioning after simulation. Stereotactic Body radiation therapy (SBRT) and Radiosurgery (SRS) have gradually become the standard of care for many treatment sites, which demand a higher throughput for the treatment plans even if the number of treatments per day remains the same. Finally, simulation, planning and treatment are traditionally sequentialmore » events. However, with emerging adaptive radiotherapy, they are becoming more tightly intertwined, leading to iterative processes. Enhanced efficiency of planning is therefore becoming more critical and poses serious challenge to the treatment planning process; Lean Six Sigma approaches are being utilized increasingly to balance the competing needs for speed and quality. In this symposium we will discuss the treatment planning process and illustrate effective techniques for managing workflow. Topics will include: Planning techniques: (a) beam placement, (b) dose optimization, (c) plan evaluation (d) export to RVS. Planning workflow: (a) import images, (b) Image fusion, (c) contouring, (d) plan approval (e) plan check (f) chart check, (g) sequential and iterative process Influence of upstream and downstream operations: (a) simulation, (b) immobilization, (c) motion management, (d) QA, (e) IGRT, (f) Treatment delivery, (g) SBRT/SRS (h) adaptive planning Reduction of delay between planning steps with Lean systems due to (a) communication, (b) limited resource, (b) contour, (c) plan approval, (d) treatment. Optimizing planning processes: (a) contour validation (b) consistent planning protocol, (c) protocol/template sharing, (d) semi-automatic plan evaluation, (e) quality checklist for error prevention, (f) iterative process, (g) balance of speed and quality Learning Objectives: Gain familiarity with the workflow of modern treatment planning process. Understand the scope and challenges of managing modern treatment planning processes. Gain familiarity with Lean Six Sigma approaches and their implementation in the treatment planning workflow.« less
Defining Usability Heuristics for Adoption and Efficiency of an Electronic Workflow Document Management System

ERIC Educational Resources Information Center

Fuentes, Steven

2017-01-01

Usability heuristics have been established for different uses and applications as general guidelines for user interfaces. These can affect the implementation of industry solutions and play a significant role regarding cost reduction and process efficiency. The area of electronic workflow document management (EWDM) solutions, also known as…

Towards an intelligent hospital environment: OR of the future.

PubMed

Sutherland, Jeffrey V; van den Heuvel, Willem-Jan; Ganous, Tim; Burton, Matthew M; Kumar, Animesh

2005-01-01

Patients, providers, payers, and government demand more effective and efficient healthcare services, and the healthcare industry needs innovative ways to re-invent core processes. Business process reengineering (BPR) showed adopting new hospital information systems can leverage this transformation and workflow management technologies can automate process management. Our research indicates workflow technologies in healthcare require real time patient monitoring, detection of adverse events, and adaptive responses to breakdown in normal processes. Adaptive workflow systems are rarely implemented making current workflow implementations inappropriate for healthcare. The advent of evidence based medicine, guideline based practice, and better understanding of cognitive workflow combined with novel technologies including Radio Frequency Identification (RFID), mobile/wireless technologies, internet workflow, intelligent agents, and Service Oriented Architectures (SOA) opens up new and exciting ways of automating business processes. Total situational awareness of events, timing, and location of healthcare activities can generate self-organizing change in behaviors of humans and machines. A test bed of a novel approach towards continuous process management was designed for the new Weinburg Surgery Building at the University of Maryland Medical. Early results based on clinical process mapping and analysis of patient flow bottlenecks demonstrated 100% improvement in delivery of supplies and instruments at surgery start time. This work has been directly applied to the design of the DARPA Trauma Pod research program where robotic surgery will be performed on wounded soldiers on the battlefield.
CASAS: A tool for composing automatically and semantically astrophysical services

NASA Astrophysics Data System (ADS)

Louge, T.; Karray, M. H.; Archimède, B.; Knödlseder, J.

2017-07-01

Multiple astronomical datasets are available through internet and the astrophysical Distributed Computing Infrastructure (DCI) called Virtual Observatory (VO). Some scientific workflow technologies exist for retrieving and combining data from those sources. However selection of relevant services, automation of the workflows composition and the lack of user-friendly platforms remain a concern. This paper presents CASAS, a tool for semantic web services composition in astrophysics. This tool proposes automatic composition of astrophysical web services and brings a semantics-based, automatic composition of workflows. It widens the services choice and eases the use of heterogeneous services. Semantic web services composition relies on ontologies for elaborating the services composition; this work is based on Astrophysical Services ONtology (ASON). ASON had its structure mostly inherited from the VO services capacities. Nevertheless, our approach is not limited to the VO and brings VO plus non-VO services together without the need for premade recipes. CASAS is available for use through a simple web interface.
Use of mechanistic simulations as a quantitative risk-ranking tool within the quality by design framework.

PubMed

Stocker, Elena; Toschkoff, Gregor; Sacher, Stephan; Khinast, Johannes G

2014-11-20

The purpose of this study is to evaluate the use of computer simulations for generating quantitative knowledge as a basis for risk ranking and mechanistic process understanding, as required by ICH Q9 on quality risk management systems. In this specific publication, the main focus is the demonstration of a risk assessment workflow, including a computer simulation for the generation of mechanistic understanding of active tablet coating in a pan coater. Process parameter screening studies are statistically planned under consideration of impacts on a potentially critical quality attribute, i.e., coating mass uniformity. Based on computer simulation data the process failure mode and effects analysis of the risk factors is performed. This results in a quantitative criticality assessment of process parameters and the risk priority evaluation of failure modes. The factor for a quantitative reassessment of the criticality and risk priority is the coefficient of variation, which represents the coating mass uniformity. The major conclusion drawn from this work is a successful demonstration of the integration of computer simulation in the risk management workflow leading to an objective and quantitative risk assessment. Copyright © 2014. Published by Elsevier B.V.
The EVER-EST portal as support for the Sea Monitoring Virtual Research Community, through the sharing of resources, enabling dynamic collaboration and promoting community engagement

NASA Astrophysics Data System (ADS)

Foglini, Federica; Grande, Valentina; De Leo, Francesco; Mantovani, Simone; Ferraresi, Sergio

2017-04-01

EVER-EST offers a framework based on advanced services delivered both at the e-infrastructure and domain-specific level, with the objective of supporting each phase of the Earth Science Research and Information Lifecycle. It provides innovative e-research services to Earth Science user communities for communication, cross-validation and the sharing of knowledge and science outputs. The project follows a user-centric approach: real use cases taken from pre-selected Virtual Research Communities (VRC) covering different Earth Science research scenarios drive the implementation of the Virtual Research Environment (VRE) services and capabilities. The Sea Monitoring community is involved in the evaluation of the EVER-EST infrastructure. The community of potential users is wide and heterogeneous including both multi-disciplinary scientists and national/international agencies and authorities (e.g. MPAs directors, technicians from regional agencies like ARPA in Italy, the technicians working for the Ministry of the Environment) dealing with the adoption of a better way of measuring the quality of the environment. The scientific community has the main role of assessing the best criteria and indicators for defining the Good Environmental Status (GES) in their own sub regions, and implementing methods, protocols and tools for monitoring the GES descriptors. According to the Marine Strategy Framework Directive (MSFD), the environmental status of marine waters is defined by 11 descriptors, and forms a proposed set of 29 associated criteria and 56 different indicators. The objective of the Sea Monitoring VRC is to provide useful and applicable contributions to the evaluation of the descriptors: D1.Biodiversity, D2.Non-indigenous species and D6.Seafloor Integrity (http://ec.europa.eu/environment/marine/good-environmental-status/index_en.htm). The main challenges for the community members are: 1. discovery of existing data and products distributed among different infrastructures; 2. sharing methodologies about the GES evaluation and monitoring; 3. working on the same workflows and data; 4. adopting shared powerful tools for data processing (e.g. software and servers). The Sea Monitoring portal provides the VRC users with tools and services aimed at enhancing their ability to interoperate and share knowledge, experience and methods for GES assessment and monitoring, such as: •digital information services for data management, exploitation and preservation (accessibility of heterogeneous data sources including associated documentation); •e-collaboration services to communicate and share knowledge, ideas, protocols and workflows; •e-learning services to facilitate the use of common workflows for assessing GES indicators; •e-research services for workflow management, validation and verification, as well as visualization and interactive services. The current study is co-financed by the European Union's Horizon 2020 research and innovation programme under the EVER-EST project (Grant Agreement No. 674907).
Simulation environment and graphical visualization environment: a COPD use-case.

PubMed

Huertas-Migueláñez, Mercedes; Mora, Daniel; Cano, Isaac; Maier, Dieter; Gomez-Cabrero, David; Lluch-Ariet, Magí; Miralles, Felip

2014-11-28

Today, many different tools are developed to execute and visualize physiological models that represent the human physiology. Most of these tools run models written in very specific programming languages which in turn simplify the communication among models. Nevertheless, not all of these tools are able to run models written in different programming languages. In addition, interoperability between such models remains an unresolved issue. In this paper we present a simulation environment that allows, first, the execution of models developed in different programming languages and second the communication of parameters to interconnect these models. This simulation environment, developed within the Synergy-COPD project, aims at helping and supporting bio-researchers and medical students understand the internal mechanisms of the human body through the use of physiological models. This tool is composed of a graphical visualization environment, which is a web interface through which the user can interact with the models, and a simulation workflow management system composed of a control module and a data warehouse manager. The control module monitors the correct functioning of the whole system. The data warehouse manager is responsible for managing the stored information and supporting its flow among the different modules. It has been proved that the simulation environment presented here allows the user to research and study the internal mechanisms of the human physiology by the use of models via a graphical visualization environment. A new tool for bio-researchers is ready for deployment in various use cases scenarios.
Using bio.tools to generate and annotate workbench tool descriptions

PubMed Central

Hillion, Kenzo-Hugo; Kuzmin, Ivan; Khodak, Anton; Rasche, Eric; Crusoe, Michael; Peterson, Hedi; Ison, Jon; Ménager, Hervé

2017-01-01

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata. PMID:29333231
BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.

PubMed

Hardisty, Alex R; Bacall, Finn; Beard, Niall; Balcázar-Vargas, Maria-Paula; Balech, Bachir; Barcza, Zoltán; Bourlat, Sarah J; De Giovanni, Renato; de Jong, Yde; De Leo, Francesca; Dobor, Laura; Donvito, Giacinto; Fellows, Donal; Guerra, Antonio Fernandez; Ferreira, Nuno; Fetyukova, Yuliya; Fosso, Bruno; Giddy, Jonathan; Goble, Carole; Güntsch, Anton; Haines, Robert; Ernst, Vera Hernández; Hettling, Hannes; Hidy, Dóra; Horváth, Ferenc; Ittzés, Dóra; Ittzés, Péter; Jones, Andrew; Kottmann, Renzo; Kulawik, Robert; Leidenberger, Sonja; Lyytikäinen-Saarenmaa, Päivi; Mathew, Cherian; Morrison, Norman; Nenadic, Aleksandra; de la Hidalga, Abraham Nieva; Obst, Matthias; Oostermeijer, Gerard; Paymal, Elisabeth; Pesole, Graziano; Pinto, Salvatore; Poigné, Axel; Fernandez, Francisco Quevedo; Santamaria, Monica; Saarenmaa, Hannu; Sipos, Gergely; Sylla, Karl-Heinz; Tähtinen, Marko; Vicario, Saverio; Vos, Rutger Aldo; Williams, Alan R; Yilmaz, Pelin

2016-10-20

Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.
Accounting for aquifer heterogeneity from geological data to management tools.

PubMed

Blouin, Martin; Martel, Richard; Gloaguen, Erwan

2013-01-01

A nested workflow of multiple-point geostatistics (MPG) and sequential Gaussian simulation (SGS) was tested on a study area of 6 km(2) located about 20 km northwest of Quebec City, Canada. In order to assess its geological and hydrogeological parameter heterogeneity and to provide tools to evaluate uncertainties in aquifer management, direct and indirect field measurements are used as inputs in the geostatistical simulations to reproduce large and small-scale heterogeneities. To do so, the lithological information is first associated to equivalent hydrogeological facies (hydrofacies) according to hydraulic properties measured at several wells. Then, heterogeneous hydrofacies (HF) realizations are generated using a prior geological model as training image (TI) with the MPG algorithm. The hydraulic conductivity (K) heterogeneity modeling within each HF is finally computed using SGS algorithm. Different K models are integrated in a finite-element hydrogeological model to calculate multiple transport simulations. Different scenarios exhibit variations in mass transport path and dispersion associated with the large- and small-scale heterogeneity respectively. Three-dimensional maps showing the probability of overpassing different thresholds are presented as examples of management tools. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.
The United States Geological Survey Science Data Lifecycle Model

USGS Publications Warehouse

Faundeen, John L.; Burley, Thomas E.; Carlino, Jennifer A.; Govoni, David L.; Henkel, Heather S.; Holl, Sally L.; Hutchison, Vivian B.; Martín, Elizabeth; Montgomery, Ellyn T.; Ladino, Cassandra; Tessler, Steven; Zolly, Lisa S.

2014-01-01

U.S. Geological Survey (USGS) data represent corporate assets with potential value beyond any immediate research use, and therefore need to be accounted for and properly managed throughout their lifecycle. Recognizing these motives, a USGS team developed a Science Data Lifecycle Model (SDLM) as a high-level view of data—from conception through preservation and sharing—to illustrate how data management activities relate to project workflows, and to assist with understanding the expectations of proper data management. In applying the Model to research activities, USGS scientists can ensure that data products will be well-described, preserved, accessible, and fit for re-use. The Model also serves as a structure to help the USGS evaluate and improve policies and practices for managing scientific data, and to identify areas in which new tools and standards are needed.
Enhancing population pharmacokinetic modeling efficiency and quality using an integrated workflow.

PubMed

Schmidt, Henning; Radivojevic, Andrijana

2014-08-01

Population pharmacokinetic (popPK) analyses are at the core of Pharmacometrics and need to be performed regularly. Although these analyses are relatively standard, a large variability can be observed in both the time (efficiency) and the way they are performed (quality). Main reasons for this variability include the level of experience of a modeler, personal preferences and tools. This paper aims to examine how the process of popPK model building can be supported in order to increase its efficiency and quality. The presented approach to the conduct of popPK analyses is centered around three key components: (1) identification of most common and important popPK model features, (2) required information content and formatting of the data for modeling, and (3) methodology, workflow and workflow supporting tools. This approach has been used in several popPK modeling projects and a documented example is provided in the supplementary material. Efficiency of model building is improved by avoiding repetitive coding and other labor-intensive tasks and by putting the emphasis on a fit-for-purpose model. Quality is improved by ensuring that the workflow and tools are in alignment with a popPK modeling guidance which is established within an organization. The main conclusion of this paper is that workflow based approaches to popPK modeling are feasible and have significant potential to ameliorate its various aspects. However, the implementation of such an approach in a pharmacometric organization requires openness towards innovation and change-the key ingredient for evolution of integrative and quantitative drug development in the pharmaceutical industry.
gProcess and ESIP Platforms for Satellite Imagery Processing over the Grid

NASA Astrophysics Data System (ADS)

Bacu, Victor; Gorgan, Dorian; Rodila, Denisa; Pop, Florin; Neagu, Gabriel; Petcu, Dana

2010-05-01

The Environment oriented Satellite Data Processing Platform (ESIP) is developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) co-funded by the European Commission through FP7 [1]. The gProcess Platform [2] is a set of tools and services supporting the development and the execution over the Grid of the workflow based processing, and particularly the satelite imagery processing. The ESIP [3], [4] is build on top of the gProcess platform by adding a set of satellite image processing software modules and meteorological algorithms. The satellite images can reveal and supply important information on earth surface parameters, climate data, pollution level, weather conditions that can be used in different research areas. Generally, the processing algorithms of the satellite images can be decomposed in a set of modules that forms a graph representation of the processing workflow. Two types of workflows can be defined in the gProcess platform: abstract workflow (PDG - Process Description Graph), in which the user defines conceptually the algorithm, and instantiated workflow (iPDG - instantiated PDG), which is the mapping of the PDG pattern on particular satellite image and meteorological data [5]. The gProcess platform allows the definition of complex workflows by combining data resources, operators, services and sub-graphs. The gProcess platform is developed for the gLite middleware that is available in EGEE and SEE-GRID infrastructures [6]. gProcess exposes the specific functionality through web services [7]. The Editor Web Service retrieves information on available resources that are used to develop complex workflows (available operators, sub-graphs, services, supported resources, etc.). The Manager Web Service deals with resources management (uploading new resources such as workflows, operators, services, data, etc.) and in addition retrieves information on workflows. The Executor Web Service manages the execution of the instantiated workflows on the Grid infrastructure. In addition, this web service monitors the execution and generates statistical data that are important to evaluate performances and to optimize execution. The Viewer Web Service allows access to input and output data. To prove and to validate the utility of the gProcess and ESIP platforms there were developed the GreenView and GreenLand applications. The GreenView related functionality includes the refinement of some meteorological data such as temperature, and the calibration of the satellite images based on field measurements. The GreenLand application performs the classification of the satellite images by using a set of vegetation indices. The gProcess and ESIP platforms are used as well in GiSHEO project [8] to support the processing of Earth Observation data over the Grid in eGLE (GiSHEO eLearning Environment). Experiments of performance assessment were conducted and they have revealed that the workflow-based execution could improve the execution time of a satellite image processing algorithm [9]. It is not a reliable solution to execute all the workflow nodes on different machines. The execution of some nodes can be more time consuming and they will be performed in a longer time than other nodes. The total execution time will be affected because some nodes will slow down the execution. It is important to correctly balance the workflow nodes. Based on some optimization strategy the workflow nodes can be grouped horizontally, vertically or in a hybrid approach. In this way, those operators will be executed on one machine and also the data transfer between workflow nodes will be lower. The dynamic nature of the Grid infrastructure makes it more exposed to the occurrence of failures. These failures can occur at worker node, services availability, storage element, etc. Currently gProcess has support for some basic error prevention and error management solutions. In future, some more advanced error prevention and management solutions will be integrated in the gProcess platform. References [1] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [2] Bacu V., Stefanut T., Rodila D., Gorgan D., Process Description Graph Composition by gProcess Platform. HiPerGRID - 3rd International Workshop on High Performance Grid Middleware, 28 May, Bucharest. Proceedings of CSCS-17 Conference, Vol.2., ISSN 2066-4451, pp. 423-430, (2009). [3] ESIP Platform, http://wiki.egee-see.org/index.php/JRA1_Commonalities [4] Gorgan D., Bacu V., Rodila D., Pop Fl., Petcu D., Experiments on ESIP - Environment oriented Satellite Data Processing Platform. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 157-166 (2009). [5] Radu, A., Bacu, V., Gorgan, D., Diagrammatic Description of Satellite Image Processing Workflow. Workshop on Grid Computing Applications Development (GridCAD) at the SYNASC Symposium, 28 September 2007, Timisoara, IEEE Computer Press, ISBN 0-7695-3078-8, 2007, pp. 341-348 (2007). [6] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [7] Rodila D., Bacu V., Gorgan D., Integration of Satellite Image Operators as Workflows in the gProcess Application. Proceedings of ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27-29 Aug, 2009 Cluj-Napoca. ISBN: 978-1-4244-5007-7, pp. 355-358 (2009). [8] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [9] Bacu V., Gorgan D., Graph Based Evaluation of Satellite Imagery Processing over Grid. ISPDC 2008 - 7th International Symposium on Parallel and Distributed Computing, July 1-5, 2008, Krakow, Poland. IEEE Computer Society 2008, ISBN: 978-0-7695-3472-5, pp. 147-154.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

PubMed

Lu, Zhiyong; Hirschman, Lynette

2012-01-01

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
Confidentiality Protection of User Data and Adaptive Resource Allocation for Managing Multiple Workflow Performance in Service-Based Systems

ERIC Educational Resources Information Center

An, Ho

2012-01-01

In this dissertation, two interrelated problems of service-based systems (SBS) are addressed: protecting users' data confidentiality from service providers, and managing performance of multiple workflows in SBS. Current SBSs pose serious limitations to protecting users' data confidentiality. Since users' sensitive data is sent in…
The Implementation and Evaluation of the Patient Admission Prediction Tool: Assessing Its Impact on Decision-Making Strategies and Patient Flow Outcomes in 2 Australian Hospitals.

PubMed

Crilly, Julia L; Boyle, Justin; Jessup, Melanie; Wallis, Marianne; Lind, James; Green, David; FitzGerald, Gerry

2015-01-01

To evaluate the implementation of a Patient Admission Prediction Tool (PAPT) in terms of patient flow outcomes and decision-making strategies. The PAPT was implemented in 2 Australian public teaching hospitals during October-December 2010 (hospital A) and October-December 2011 (hospital B). A multisite prospective, comparative (before and after) design was used. Patient flow outcomes measured included access block and hospital occupancy. Daily and weekly data were collected from patient flow reports and routinely collected emergency department information by the site champion and researchers. Daily decision-making strategies ranged from business as usual to use of overcensus beds. Weekly strategies included advanced approval to use of overcensus beds and prebooking nursing staff. These strategies resulted in improved weekend discharges to manage incoming demand for the following week. Following the introduction of the PAPT and workflow guidelines, patient access and hospital occupancy levels could be maintained despite increases in patient presentations (hospital A). The use of a PAPT, embedded in patient flow management processes and championed by a manager, can benefit bed and staff management. Further research that incorporates wider evaluation of the use of the tool at other sites is warranted.
Changes in the cardiac rehabilitation workflow process needed for the implementation of a self-management system.

PubMed

Wiggers, Anne-Marieke; Vosbergen, Sandra; Kraaijenhagen, Roderik; Jaspers, Monique; Peek, Niels

2013-01-01

E-health interventions are of a growing importance for self-management of chronic conditions. This study aimed to describe the process adaptions that are needed in cardiac rehabilitation (CR) to implement a self-management system, called MyCARDSS. We created a generic workflow model based on interviews and observations at three CR clinics. Subsequently, a workflow model of the ideal situation after implementation of MyCARDSS was created. We found that the implementation will increase the complexity of existing working procedures because 1) not all patients will use MyCARDSS, 2) there is a transfer of tasks and responsibilities from professionals to patients, and 3) information in MyCARDSS needs to be synchronized with the EPR system for professionals.
Automatic system testing of a decision support system for insulin dosing using Google Android.

PubMed

Spat, Stephan; Höll, Bernhard; Petritsch, Georg; Schaupp, Lukas; Beck, Peter; Pieber, Thomas R

2013-01-01

Hyperglycaemia in hospitalized patients is a common and costly health care problem. The GlucoTab system is a mobile workflow and decision support system, aiming to facilitate efficient and safe glycemic control of non-critically ill patients. Being a medical device, the GlucoTab requires extensive and reproducible testing. A framework for high-volume, reproducible and automated system testing of the GlucoTab system was set up applying several Open Source tools for test automation and system time handling. The REACTION insulin titration protocol was investigated in a paper-based clinical trial (PBCT). In order to validate the GlucoTab system, data from this trial was used for simulation and system tests. In total, 1190 decision support action points were identified and simulated. Four data points (0.3%) resulted in a GlucoTab system error caused by a defective implementation. In 144 data points (12.1%), calculation errors of physicians and nurses in the PBCT were detected. The test framework was able to verify manual calculation of insulin doses and detect relatively many user errors and workflow anomalies in the PBCT data. This shows the high potential of the electronic decision support application to improve safety of implementation of an insulin titration protocol and workflow management system in clinical wards.
Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

PubMed Central

Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495
Using workflows to explore and optimise named entity recognition for chemistry.

PubMed

Kolluru, Balakrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.
Sustaining an Online, Shared Community Resource for Models, Robust Open source Software Tools and Data for Volcanology - the Vhub Experience

NASA Astrophysics Data System (ADS)

Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.

2015-12-01

Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical models.
New ArcGIS tools developed for stream network extraction and basin delineations using Python and java script

NASA Astrophysics Data System (ADS)

Omran, Adel; Dietrich, Schröder; Abouelmagd, Abdou; Michael, Märker

2016-09-01

Damages caused by flash floods hazards are an increasing phenomenon, especially in arid and semi-arid areas. Thus, the need to evaluate these areas based on their flash flood risk using maps and hydrological models is also becoming more important. For ungauged watersheds a tentative analysis can be carried out based on the geomorphometric characteristics of the terrain. To process regions with larger watersheds, where perhaps hundreds of watersheds have to be delineated, processed and classified, the overall process need to be automated. GIS packages such as ESRI's ArcGIS offer a number of sophisticated tools that help regarding such analysis. Yet there are still gaps and pitfalls that need to be considered if the tools are combined into a geoprocessing model to automate the complete assessment workflow. These gaps include issues such as i) assigning stream order according to Strahler theory, ii) calculating the threshold value for the stream network extraction, and iii) determining the pour points for each of the nodes of the Strahler ordered stream network. In this study a complete automated workflow based on ArcGIS Model Builder using standard tools will be introduced and discussed. Some additional tools have been implemented to complete the overall workflow. These tools have been programmed using Python and Java in the context of ArcObjects. The workflow has been applied to digital data from the southwestern Sinai Peninsula, Egypt. An optimum threshold value has been selected to optimize drainage configuration by statistically comparing all of the extracted stream configuration results from DEM with the available reference data from topographic maps. The code has succeeded in estimating the correct ranking of specific stream orders in an automatic manner without additional manual steps. As a result, the code has proven to save time and efforts; hence it's considered a very useful tool for processing large catchment basins.

Data management integration for biomedical core facilities

NASA Astrophysics Data System (ADS)

Zhang, Guo-Qiang; Szymanski, Jacek; Wilson, David

2007-03-01

We present the design, development, and pilot-deployment experiences of MIMI, a web-based, Multi-modality Multi-Resource Information Integration environment for biomedical core facilities. This is an easily customizable, web-based software tool that integrates scientific and administrative support for a biomedical core facility involving a common set of entities: researchers; projects; equipments and devices; support staff; services; samples and materials; experimental workflow; large and complex data. With this software, one can: register users; manage projects; schedule resources; bill services; perform site-wide search; archive, back-up, and share data. With its customizable, expandable, and scalable characteristics, MIMI not only provides a cost-effective solution to the overarching data management problem of biomedical core facilities unavailable in the market place, but also lays a foundation for data federation to facilitate and support discovery-driven research.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology

NASA Astrophysics Data System (ADS)

Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm

2015-04-01

Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
How credit scores can make a difference for your revenue cycle.

PubMed

Jackson, Garett

2008-02-01

Questions an organization should consider before implementing credit scoring include: What action will be taken once a score is obtained? Should the score impact the workflow? Are the appropriate tools available to alter the workflow? How will accounts receivable staff be redirected or allocated? What is the risk tolerance for scoring mistakes?
Analyzing data flows of WLCG jobs at batch job level

NASA Astrophysics Data System (ADS)

Kuehn, Eileen; Fischer, Max; Giffels, Manuel; Jung, Christopher; Petzold, Andreas

2015-05-01

With the introduction of federated data access to the workflows of WLCG, it is becoming increasingly important for data centers to understand specific data flows regarding storage element accesses, firewall configurations, as well as the scheduling of batch jobs themselves. As existing batch system monitoring and related system monitoring tools do not support measurements at batch job level, a new tool has been developed and put into operation at the GridKa Tier 1 center for monitoring continuous data streams and characteristics of WLCG jobs and pilots. Long term measurements and data collection are in progress. These measurements already have been proven to be useful analyzing misbehaviors and various issues. Therefore we aim for an automated, realtime approach for anomaly detection. As a requirement, prototypes for standard workflows have to be examined. Based on measurements of several months, different features of HEP jobs are evaluated regarding their effectiveness for data mining approaches to identify these common workflows. The paper will introduce the actual measurement approach and statistics as well as the general concept and first results classifying different HEP job workflows derived from the measurements at GridKa.
Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME

PubMed Central

Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.

2015-01-01

Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Scientific workflows as productivity tools for drug discovery.

PubMed

Shon, John; Ohkawa, Hitomi; Hammer, Juergen

2008-05-01

Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.
Integration and validation testing for PhEDEx, DBS and DAS with the PhEDEx LifeCycle agent

NASA Astrophysics Data System (ADS)

Boeser, C.; Chwalek, T.; Giffels, M.; Kuznetsov, V.; Wildish, T.

2014-06-01

The ever-increasing amount of data handled by the CMS dataflow and workflow management tools poses new challenges for cross-validation among different systems within CMS experiment at LHC. To approach this problem we developed an integration test suite based on the LifeCycle agent, a tool originally conceived for stress-testing new releases of PhEDEx, the CMS data-placement tool. The LifeCycle agent provides a framework for customising the test workflow in arbitrary ways, and can scale to levels of activity well beyond those seen in normal running. This means we can run realistic performance tests at scales not likely to be seen by the experiment for some years, or with custom topologies to examine particular situations that may cause concern some time in the future. The LifeCycle agent has recently been enhanced to become a general purpose integration and validation testing tool for major CMS services. It allows cross-system integration tests of all three components to be performed in controlled environments, without interfering with production services. In this paper we discuss the design and implementation of the LifeCycle agent. We describe how it is used for small-scale debugging and validation tests, and how we extend that to large-scale tests of whole groups of sub-systems. We show how the LifeCycle agent can emulate the action of operators, physicists, or software agents external to the system under test, and how it can be scaled to large and complex systems.
A workflow learning model to improve geovisual analytics utility

PubMed Central

Roth, Robert E; MacEachren, Alan M; McCabe, Craig A

2011-01-01

Introduction This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. Objectives The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. Methodology The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545
A workflow learning model to improve geovisual analytics utility.

PubMed

Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

2009-01-01

INTRODUCTION: This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. OBJECTIVES: The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. METHODOLOGY: The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
A web-based rapid assessment tool for production publishing solutions

NASA Astrophysics Data System (ADS)

Sun, Tong

2010-02-01

Solution assessment is a critical first-step in understanding and measuring the business process efficiency enabled by an integrated solution package. However, assessing the effectiveness of any solution is usually a very expensive and timeconsuming task which involves lots of domain knowledge, collecting and understanding the specific customer operational context, defining validation scenarios and estimating the expected performance and operational cost. This paper presents an intelligent web-based tool that can rapidly assess any given solution package for production publishing workflows via a simulation engine and create a report for various estimated performance metrics (e.g. throughput, turnaround time, resource utilization) and operational cost. By integrating the digital publishing workflow ontology and an activity based costing model with a Petri-net based workflow simulation engine, this web-based tool allows users to quickly evaluate any potential digital publishing solutions side-by-side within their desired operational contexts, and provides a low-cost and rapid assessment for organizations before committing any purchase. This tool also benefits the solution providers to shorten the sales cycles, establishing a trustworthy customer relationship and supplement the professional assessment services with a proven quantitative simulation and estimation technology.
Agile based "Semi-"Automated Data ingest process : ORNL DAAC example

NASA Astrophysics Data System (ADS)

Santhana Vannan, S. K.; Beaty, T.; Cook, R. B.; Devarakonda, R.; Hook, L.; Wei, Y.; Wright, D.

2015-12-01

The ORNL DAAC archives and publishes data and information relevant to biogeochemical, ecological, and environmental processes. The data archived at the ORNL DAAC must be well formatted, self-descriptive, and documented, as well as referenced in a peer-reviewed publication. The ORNL DAAC ingest team curates diverse data sets from multiple data providers simultaneously. To streamline the ingest process, the data set submission process at the ORNL DAAC has been recently updated to use an agile process and a semi-automated workflow system has been developed to provide a consistent data provider experience and to create a uniform data product. The goals of semi-automated agile ingest process are to: 1.Provide the ability to track a data set from acceptance to publication 2. Automate steps that can be automated to improve efficiencies and reduce redundancy 3.Update legacy ingest infrastructure 4.Provide a centralized system to manage the various aspects of ingest. This talk will cover the agile methodology, workflow, and tools developed through this system.
Wireless remote control clinical image workflow: utilizing a PDA for offsite distribution

NASA Astrophysics Data System (ADS)

Liu, Brent J.; Documet, Luis; Documet, Jorge; Huang, H. K.; Muldoon, Jean

2004-04-01

Last year we presented in RSNA an application to perform wireless remote control of PACS image distribution utilizing a handheld device such as a Personal Digital Assistant (PDA). This paper describes the clinical experiences including workflow scenarios of implementing the PDA application to route exams from the clinical PACS archive server to various locations for offsite distribution of clinical PACS exams. By utilizing this remote control application, radiologists can manage image workflow distribution with a single wireless handheld device without impacting their clinical workflow on diagnostic PACS workstations. A PDA application was designed and developed to perform DICOM Query and C-Move requests by a physician from a clinical PACS Archive to a CD-burning device for automatic burning of PACS data for the distribution to offsite. In addition, it was also used for convenient routing of historical PACS exams to the local web server, local workstations, and teleradiology systems. The application was evaluated by radiologists as well as other clinical staff who need to distribute PACS exams to offsite referring physician"s offices and offsite radiologists. An application for image workflow management utilizing wireless technology was implemented in a clinical environment and evaluated. A PDA application was successfully utilized to perform DICOM Query and C-Move requests from the clinical PACS archive to various offsite exam distribution devices. Clinical staff can utilize the PDA to manage image workflow and PACS exam distribution conveniently for offsite consultations by referring physicians and radiologists. This solution allows the radiologist to expand their effectiveness in health care delivery both within the radiology department as well as offisite by improving their clinical workflow.
Workflow-enabled distributed component-based information architecture for digital medical imaging enterprises.

PubMed

Wong, Stephen T C; Tjandra, Donny; Wang, Huili; Shen, Weimin

2003-09-01

Few information systems today offer a flexible means to define and manage the automated part of radiology processes, which provide clinical imaging services for the entire healthcare organization. Even fewer of them provide a coherent architecture that can easily cope with heterogeneity and inevitable local adaptation of applications and can integrate clinical and administrative information to aid better clinical, operational, and business decisions. We describe an innovative enterprise architecture of image information management systems to fill the needs. Such a system is based on the interplay of production workflow management, distributed object computing, Java and Web techniques, and in-depth domain knowledge in radiology operations. Our design adapts the approach of "4+1" architectural view. In this new architecture, PACS and RIS become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, the workflow-enabled digital radiology system would provide powerful query and statistical functions for managing resources and improving productivity. This paper will potentially lead to a new direction of image information management. We illustrate the innovative design with examples taken from an implemented system.
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

PubMed Central

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

2015-01-01

Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

PubMed

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

2015-05-01

The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
CamBAfx: Workflow Design, Implementation and Application for Neuroimaging

PubMed Central

Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

2009-01-01

CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470
Point-of-Care-Testing in Acute Stroke Management: An Unmet Need Ripe for Technological Harvest

PubMed Central

Eltzov, Evgeni; Seet, Raymond C. S.; Marks, Robert S.; Tok, Alfred I. Y.

2017-01-01

Stroke, the second highest leading cause of death, is caused by an abrupt interruption of blood to the brain. Supply of blood needs to be promptly restored to salvage brain tissues from irreversible neuronal death. Existing assessment of stroke patients is based largely on detailed clinical evaluation that is complemented by neuroimaging methods. However, emerging data point to the potential use of blood-derived biomarkers in aiding clinical decision-making especially in the diagnosis of ischemic stroke, triaging patients for acute reperfusion therapies, and in informing stroke mechanisms and prognosis. The demand for newer techniques to deliver individualized information on-site for incorporation into a time-sensitive work-flow has become greater. In this review, we examine the roles of a portable and easy to use point-of-care-test (POCT) in shortening the time-to-treatment, classifying stroke subtypes and improving patient’s outcome. We first examine the conventional stroke management workflow, then highlight situations where a bedside biomarker assessment might aid clinical decision-making. A novel stroke POCT approach is presented, which combines the use of quantitative and multiplex POCT platforms for the detection of specific stroke biomarkers, as well as data-mining tools to drive analytical processes. Further work is needed in the development of POCTs to fulfill an unmet need in acute stroke management. PMID:28771209
AtomPy: an open atomic-data curation environment

NASA Astrophysics Data System (ADS)

Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

2014-06-01

We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.
Understanding the dispensary workflow at the Birmingham Free Clinic: a proposed framework for an informatics intervention.

PubMed

Fisher, Arielle M; Herbert, Mary I; Douglas, Gerald P

2016-02-19

The Birmingham Free Clinic (BFC) in Pittsburgh, Pennsylvania, USA is a free, walk-in clinic that serves medically uninsured populations through the use of volunteer health care providers and an on-site medication dispensary. The introduction of an electronic medical record (EMR) has improved several aspects of clinic workflow. However, pharmacists' tasks involving medication management and dispensing have become more challenging since EMR implementation due to its inability to support workflows between the medical and pharmaceutical services. To inform the design of a systematic intervention, we conducted a needs assessment study to identify workflow challenges and process inefficiencies in the dispensary. We used contextual inquiry to document the dispensary workflow and facilitate identification of critical aspects of intervention design specific to the user. Pharmacists were observed according to contextual inquiry guidelines. Graphical models were produced to aid data and process visualization. We created a list of themes describing workflow challenges and asked the pharmacists to rank them in order of significance to narrow the scope of intervention design. Three pharmacists were observed at the BFC. Observer notes were documented and analyzed to produce 13 themes outlining the primary challenges pharmacists encounter during dispensation at the BFC. The dispensary workflow is labor intensive, redundant, and inefficient when integrated with the clinical service. Observations identified inefficiencies that may benefit from the introduction of informatics interventions including: medication labeling, insufficient process notification, triple documentation, and inventory control. We propose a system for Prescription Management and General Inventory Control (RxMAGIC). RxMAGIC is a framework designed to mitigate workflow challenges and improve the processes of medication management and inventory control. While RxMAGIC is described in the context of the BFC dispensary, we believe it will be generalizable to pharmacies in other low-resource settings, both domestically and internationally.
A qualitative evaluation of the implementation of guidelines and a support tool for asthma management in primary care.

PubMed

Watkins, Kim; Fisher, Colleen; Misaghian, Jila; Schneider, Carl R; Clifford, Rhonda

2016-01-01

Asthma management in Australia is suboptimal. The "Guidelines for provision of a Pharmacist Only medicine: short acting beta agonists" (SABA guidelines) and a novel West Australian "Asthma Action Plan card" (AAP card) were concurrently developed to improve asthma management. The aim of this qualitative research was to evaluate the collaborative, multidisciplinary and multifaceted implementation of these asthma resources and identify the lessons learnt to inform future initiatives. Feedback was sought about the implementation of the SABA guidelines and the AAP card using focus groups with key stakeholders including pharmacists (×2), pharmacy assistants, asthma educators, general practitioners, practice nurses and people with asthma (patients). Audio recordings were transcribed verbatim. Data were analysed thematically using constant comparison. The common themes identified from the focus groups were categorised according to a taxonomy of barriers including barriers related to knowledge, attitudes and behaviour. Seven focus group sessions were held with 57 participants. Knowledge barriers were identified included a lack of awareness and lack of familiarity of the resources. There was a significant lack of awareness of the AAP card where passive implementation methods had been utilised. Pharmacists had good awareness of the SABA guidelines but pharmacy assistants were unaware of the guidelines despite significant involvement in the sale of SABAs. Environmental barriers included time and workflow issues and the role of the pharmacy assistant in the organisation workflows of the pharmacy. The attitudes and behaviours of health professionals and patients with asthma were discordant and this undermined optimal asthma management. Suggestions to improve asthma management included the use of legislation, the use of electronic resources integrated into workflows and training pharmacists or practice nurses to provide patients with written asthma action plans. Greater consideration needs to be given to implementation of resources to improve awareness and overcome barriers to utilisation. Attitudes and behaviours of both health professionals and patients with asthma need to be addressed. Interventions directed toward health professionals should focus on skills needs related to achieving improved communication and patient behaviour change.

Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

PubMed

Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

2012-01-01

Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.
Cytoscape: the network visualization tool for GenomeSpace workflows.

PubMed

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows

PubMed Central

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537
The “Common Solutions” Strategy of the Experiment Support group at CERN for the LHC Experiments

NASA Astrophysics Data System (ADS)

Girone, M.; Andreeva, J.; Barreiro Megino, F. H.; Campana, S.; Cinquilli, M.; Di Girolamo, A.; Dimou, M.; Giordano, D.; Karavakis, E.; Kenyon, M. J.; Kokozkiewicz, L.; Lanciotti, E.; Litmaath, M.; Magini, N.; Negri, G.; Roiser, S.; Saiz, P.; Saiz Santos, M. D.; Schovancova, J.; Sciabà, A.; Spiga, D.; Trentadue, R.; Tuckett, D.; Valassi, A.; Van der Ster, D. C.; Shiers, J. D.

2012-12-01

After two years of LHC data taking, processing and analysis and with numerous changes in computing technology, a number of aspects of the experiments’ computing, as well as WLCG deployment and operations, need to evolve. As part of the activities of the Experiment Support group in CERN's IT department, and reinforced by effort from the EGI-InSPIRE project, we present work aimed at common solutions across all LHC experiments. Such solutions allow us not only to optimize development manpower but also offer lower long-term maintenance and support costs. The main areas cover Distributed Data Management, Data Analysis, Monitoring and the LCG Persistency Framework. Specific tools have been developed including the HammerCloud framework, automated services for data placement, data cleaning and data integrity (such as the data popularity service for CMS, the common Victor cleaning agent for ATLAS and CMS and tools for catalogue/storage consistency), the Dashboard Monitoring framework (job monitoring, data management monitoring, File Transfer monitoring) and the Site Status Board. This talk focuses primarily on the strategic aspects of providing such common solutions and how this relates to the overall goals of long-term sustainability and the relationship to the various WLCG Technical Evolution Groups. The success of the service components has given us confidence in the process, and has developed the trust of the stakeholders. We are now attempting to expand the development of common solutions into the more critical workflows. The first is a feasibility study of common analysis workflow execution elements between ATLAS and CMS. We look forward to additional common development in the future.
Towards a geophysical decision-support system for monitoring and managing unstable slopes

NASA Astrophysics Data System (ADS)

Chambers, J. E.; Meldrum, P.; Wilkinson, P. B.; Uhlemann, S.; Swift, R. T.; Inauen, C.; Gunn, D.; Kuras, O.; Whiteley, J.; Kendall, J. M.

2017-12-01

Conventional approaches for condition monitoring, such as walk over surveys, remote sensing or intrusive sampling, are often inadequate for predicting instabilities in natural and engineered slopes. Surface observations cannot detect the subsurface precursors to failure events; instead they can only identify failure once it has begun. On the other hand, intrusive investigations using boreholes only sample a very small volume of ground and hence small scale deterioration process in heterogeneous ground conditions can easily be missed. It is increasingly being recognised that geophysical techniques can complement conventional approaches by providing spatial subsurface information. Here we describe the development and testing of a new geophysical slope monitoring system. It is built around low-cost electrical resistivity tomography instrumentation, combined with integrated geotechnical logging capability, and coupled with data telemetry. An automated data processing and analysis workflow is being developed to streamline information delivery. The development of this approach has provided the basis of a decision-support tool for monitoring and managing unstable slopes. The hardware component of the system has been operational at a number of field sites associated with a range of natural and engineered slopes for up to two years. We report on the monitoring results from these sites, discuss the practicalities of installing and maintaining long-term geophysical monitoring infrastructure, and consider the requirements of a fully automated data processing and analysis workflow. We propose that the result of this development work is a practical decision-support tool that can provide near-real-time information relating to the internal condition of problematic slopes.
Evaluation of user interface and workflow design of a bedside nursing clinical decision support system.

PubMed

Yuan, Michael Juntao; Finley, George Mike; Long, Ju; Mills, Christy; Johnson, Ron Kim

2013-01-31

Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration (NASA) Task Load Index to self-evaluate the amount of cognitive and physical burden associated with using the device. A total of 83 heuristic violations were identified in the studies. The distribution of the heuristic violations and their average severity are reported. The nurse evaluators successfully completed all 30 sessions of the performance evaluations. All nurses were able to use the device after a single training session. On average, the nurses took 111 seconds (SD 30 seconds) to complete the simulated task. The NASA Task Load Index results indicated that the work overhead on the nurses was low. In fact, most of the burden measures were consistent with zero. The only potentially significant burden was temporal demand, which was consistent with the primary use case of the tool. The evaluation has shown that our design was functional and met the requirements demanded by the nurses' tight schedules and heavy workloads. The user interface embedded in the tool provided compelling utility to the nurse with minimal distraction.
The XChemExplorer graphical workflow tool for routine or large-scale protein-ligand structure determination.

PubMed

Krojer, Tobias; Talon, Romain; Pearce, Nicholas; Collins, Patrick; Douangamath, Alice; Brandao-Neto, Jose; Dias, Alexandre; Marsden, Brian; von Delft, Frank

2017-03-01

XChemExplorer (XCE) is a data-management and workflow tool to support large-scale simultaneous analysis of protein-ligand complexes during structure-based ligand discovery (SBLD). The user interfaces of established crystallographic software packages such as CCP4 [Winn et al. (2011), Acta Cryst. D67, 235-242] or PHENIX [Adams et al. (2010), Acta Cryst. D66, 213-221] have entrenched the paradigm that a `project' is concerned with solving one structure. This does not hold for SBLD, where many almost identical structures need to be solved and analysed quickly in one batch of work. Functionality to track progress and annotate structures is essential. XCE provides an intuitive graphical user interface which guides the user from data processing, initial map calculation, ligand identification and refinement up until data dissemination. It provides multiple entry points depending on the need of each project, enables batch processing of multiple data sets and records metadata, progress and annotations in an SQLite database. XCE is freely available and works on any Linux and Mac OS X system, and the only dependency is to have the latest version of CCP4 installed. The design and usage of this tool are described here, and its usefulness is demonstrated in the context of fragment-screening campaigns at the Diamond Light Source. It is routinely used to analyse projects comprising 1000 data sets or more, and therefore scales well to even very large ligand-design projects.
The XChemExplorer graphical workflow tool for routine or large-scale protein–ligand structure determination

PubMed Central

Krojer, Tobias; Talon, Romain; Pearce, Nicholas; Douangamath, Alice; Brandao-Neto, Jose; Dias, Alexandre; Marsden, Brian

2017-01-01

XChemExplorer (XCE) is a data-management and workflow tool to support large-scale simultaneous analysis of protein–ligand complexes during structure-based ligand discovery (SBLD). The user interfaces of established crystallographic software packages such as CCP4 [Winn et al. (2011 ▸), Acta Cryst. D67, 235–242] or PHENIX [Adams et al. (2010 ▸), Acta Cryst. D66, 213–221] have entrenched the paradigm that a ‘project’ is concerned with solving one structure. This does not hold for SBLD, where many almost identical structures need to be solved and analysed quickly in one batch of work. Functionality to track progress and annotate structures is essential. XCE provides an intuitive graphical user interface which guides the user from data processing, initial map calculation, ligand identification and refinement up until data dissemination. It provides multiple entry points depending on the need of each project, enables batch processing of multiple data sets and records metadata, progress and annotations in an SQLite database. XCE is freely available and works on any Linux and Mac OS X system, and the only dependency is to have the latest version of CCP4 installed. The design and usage of this tool are described here, and its usefulness is demonstrated in the context of fragment-screening campaigns at the Diamond Light Source. It is routinely used to analyse projects comprising 1000 data sets or more, and therefore scales well to even very large ligand-design projects. PMID:28291762
Reproducible research in palaeomagnetism

NASA Astrophysics Data System (ADS)

Lurcock, Pontus; Florindo, Fabio

2015-04-01

The reproducibility of research findings is attracting increasing attention across all scientific disciplines. In palaeomagnetism as elsewhere, computer-based analysis techniques are becoming more commonplace, complex, and diverse. Analyses can often be difficult to reproduce from scratch, both for the original researchers and for others seeking to build on the work. We present a palaeomagnetic plotting and analysis program designed to make reproducibility easier. Part of the problem is the divide between interactive and scripted (batch) analysis programs. An interactive desktop program with a graphical interface is a powerful tool for exploring data and iteratively refining analyses, but usually cannot operate without human interaction. This makes it impossible to re-run an analysis automatically, or to integrate it into a larger automated scientific workflow - for example, a script to generate figures and tables for a paper. In some cases the parameters of the analysis process itself are not saved explicitly, making it hard to repeat or improve the analysis even with human interaction. Conversely, non-interactive batch tools can be controlled by pre-written scripts and configuration files, allowing an analysis to be 'replayed' automatically from the raw data. However, this advantage comes at the expense of exploratory capability: iteratively improving an analysis entails a time-consuming cycle of editing scripts, running them, and viewing the output. Batch tools also tend to require more computer expertise from their users. PuffinPlot is a palaeomagnetic plotting and analysis program which aims to bridge this gap. First released in 2012, it offers both an interactive, user-friendly desktop interface and a batch scripting interface, both making use of the same core library of palaeomagnetic functions. We present new improvements to the program that help to integrate the interactive and batch approaches, allowing an analysis to be interactively explored and refined, then saved as a self-contained configuration which can be re-run without human interaction. PuffinPlot can thus be used as a component of a larger scientific workflow, integrated with workflow management tools such as Kepler, without compromising its capabilities as an exploratory tool. Since both PuffinPlot and the platform it runs on (Java) are Free/Open Source software, even the most fundamental components of an analysis can be verified and reproduced.
Developing a Taxonomy of Characteristics and Features of Collaboration Tools for Teams in Distributed Environments

DTIC Science & Technology

2007-09-01

Motion URL: http://www.blackberry.com/products/blackberry/index.shtml Software Name: Bricolage Company: Bricolage URL: http://www.bricolage.cc...Workflow Customizable control over editorial content. Bricolage Bricolage Feature Description Software Company Workflow Allows development...content for Nuxeo Collaborative Portal projects. Nuxeo Workspace Add, edit, delete, content through web interface. Bricolage Bricolage
Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

NASA Astrophysics Data System (ADS)

Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

2008-12-01

The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with many interconnected entities (artifacts or agents) are represented as RDF triples using semantic content repository middleware Tupelo in one or many data/metadata RDF stores. Queries to the RDF enables discovery of relations among data, process and people, digging out valuable aspects, making recommendations to users, such as what tools are typically used to answer certain kinds of questions or with certain types of dataset. This innovative concept brings out coherent information about entities from four different perspectives of the social context (Who-human relations and interactions), the casual context (Why - provenance and history), the geo-spatial context (Where - location or spatially referenced information) and the conceptual context (What - domain specific relations, ontologies etc.).
Exploiting volatile opportunistic computing resources with Lobster

NASA Astrophysics Data System (ADS)

Woodard, Anna; Wolf, Matthias; Mueller, Charles; Tovar, Ben; Donnelly, Patrick; Hurtado Anampa, Kenyi; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2015-12-01

Analysis of high energy physics experiments using the Compact Muon Solenoid (CMS) at the Large Hadron Collider (LHC) can be limited by availability of computing resources. As a joint effort involving computer scientists and CMS physicists at Notre Dame, we have developed an opportunistic workflow management tool, Lobster, to harvest available cycles from university campus computing pools. Lobster consists of a management server, file server, and worker processes which can be submitted to any available computing resource without requiring root access. Lobster makes use of the Work Queue system to perform task management, while the CMS specific software environment is provided via CVMFS and Parrot. Data is handled via Chirp and Hadoop for local data storage and XrootD for access to the CMS wide-area data federation. An extensive set of monitoring and diagnostic tools have been developed to facilitate system optimisation. We have tested Lobster using the 20 000-core cluster at Notre Dame, achieving approximately 8-10k tasks running simultaneously, sustaining approximately 9 Gbit/s of input data and 340 Mbit/s of output data.
New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

NASA Astrophysics Data System (ADS)

Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

2007-12-01

High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.
Automated lattice data generation

NASA Astrophysics Data System (ADS)

Ayyar, Venkitesh; Hackett, Daniel C.; Jay, William I.; Neil, Ethan T.

2018-03-01

The process of generating ensembles of gauge configurations (and measuring various observables over them) can be tedious and error-prone when done "by hand". In practice, most of this procedure can be automated with the use of a workflow manager. We discuss how this automation can be accomplished using Taxi, a minimal Python-based workflow manager built for generating lattice data. We present a case study demonstrating this technology.
KDE Bioscience: platform for bioinformatics analysis workflows.

PubMed

Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

2006-08-01

Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
Web service module for access to g-Lite

NASA Astrophysics Data System (ADS)

Goranova, R.; Goranov, G.

2012-10-01

G-Lite is a lightweight grid middleware for grid computing installed on all clusters of the European Grid Infrastructure (EGI). The middleware is partially service-oriented and does not provide well-defined Web services for job management. The existing Web services in the environment cannot be directly used by grid users for building service compositions in the EGI. In this article we present a module of well-defined Web services for job management in the EGI. We describe the architecture of the module and the design of the developed Web services. The presented Web services are composable and can participate in service compositions (workflows). An example of usage of the module with tools for service compositions in g-Lite is shown.
GCE Data Toolbox for MATLAB - a software framework for automating environmental data processing, quality control and documentation

NASA Astrophysics Data System (ADS)

Sheldon, W.; Chamblee, J.; Cary, R. H.

2013-12-01

Environmental scientists are under increasing pressure from funding agencies and journal publishers to release quality-controlled data in a timely manner, as well as to produce comprehensive metadata for submitting data to long-term archives (e.g. DataONE, Dryad and BCO-DMO). At the same time, the volume of digital data that researchers collect and manage is increasing rapidly due to advances in high frequency electronic data collection from flux towers, instrumented moorings and sensor networks. However, few pre-built software tools are available to meet these data management needs, and those tools that do exist typically focus on part of the data management lifecycle or one class of data. The GCE Data Toolbox has proven to be both a generalized and effective software solution for environmental data management in the Long Term Ecological Research Network (LTER). This open source MATLAB software library, developed by the Georgia Coastal Ecosystems LTER program, integrates metadata capture, creation and management with data processing, quality control and analysis to support the entire data lifecycle. Raw data can be imported directly from common data logger formats (e.g. SeaBird, Campbell Scientific, YSI, Hobo), as well as delimited text files, MATLAB files and relational database queries. Basic metadata are derived from the data source itself (e.g. parsed from file headers) and by value inspection, and then augmented using editable metadata templates containing boilerplate documentation, attribute descriptors, code definitions and quality control rules. Data and metadata content, quality control rules and qualifier flags are then managed together in a robust data structure that supports database functionality and ensures data validity throughout processing. A growing suite of metadata-aware editing, quality control, analysis and synthesis tools are provided with the software to support managing data using graphical forms and command-line functions, as well as developing automated workflows for unattended processing. Finalized data and structured metadata can be exported in a wide variety of text and MATLAB formats or uploaded to a relational database for long-term archiving and distribution. The GCE Data Toolbox can be used as a complete, light-weight solution for environmental data and metadata management, but it can also be used in conjunction with other cyber infrastructure to provide a more comprehensive solution. For example, newly acquired data can be retrieved from a Data Turbine or Campbell LoggerNet Database server for quality control and processing, then transformed to CUAHSI Observations Data Model format and uploaded to a HydroServer for distribution through the CUAHSI Hydrologic Information System. The GCE Data Toolbox can also be leveraged in analytical workflows developed using Kepler or other systems that support MATLAB integration or tool chaining. This software can therefore be leveraged in many ways to help researchers manage, analyze and distribute the data they collect.
Teach-Discover-Treat (TDT): Collaborative Computational Drug Discovery for Neglected Diseases

PubMed Central

Jansen, Johanna M.; Cornell, Wendy; Tseng, Y. Jane; Amaro, Rommie E.

2012-01-01

Teach – Discover – Treat (TDT) is an initiative to promote the development and sharing of computational tools solicited through a competition with the aim to impact education and collaborative drug discovery for neglected diseases. Collaboration, multidisciplinary integration, and innovation are essential for successful drug discovery. This requires a workforce that is trained in state-of-the-art workflows and equipped with the ability to collaborate on platforms that are accessible and free. The TDT competition solicits high quality computational workflows for neglected disease targets, using freely available, open access tools. PMID:23085175
MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data

PubMed Central

Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

2016-01-01

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381
MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

PubMed

Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

2015-01-01

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.

Designing healthcare information technology to catalyse change in clinical care.

PubMed

Lester, William T; Zai, Adrian H; Grant, Richard W; Chueh, Henry C

2008-01-01

The gap between best practice and actual patient care continues to be a pervasive problem in our healthcare system. Efforts to improve on this knowledge-performance gap have included computerised disease management programs designed to improve guideline adherence. However, current computerised reminder and decision support interventions directed at changing physician behaviour have had only a limited and variable effect on clinical outcomes. Further, immediate pay-for-performance financial pressures on institutions have created an environment where disease management systems are often created under duress, appended to existing clinical systems and poorly integrated into the existing workflow, potentially limiting their real-world effectiveness. The authors present a review of disease management as well as a conceptual framework to guide the development of more effective health information technology (HIT) tools for translating clinical information into clinical action.
Implementation of Task-Tracking Software for Clinical IT Management.

PubMed

Purohit, Anne-Maria; Brutscheck, Clemens; Prokosch, Hans-Ulrich; Ganslandt, Thomas; Schneider, Martin

2017-01-01

Often in clinical IT departments, many different methods and IT systems are used for task-tracking and project organization. Based on managers' personal preferences and knowledge about project management methods, tools differ from team to team and even from employee to employee. This causes communication problems, especially when tasks need to be done in cooperation with different teams. Monitoring tasks and resources becomes impossible: there are no defined deliverables, which prevents reliable deadlines. Because of these problems, we implemented task-tracking software which is now in use across all seven teams at the University Hospital Erlangen. Over a period of seven months, a working group defined types of tasks (project, routine task, etc.), workflows, and views to monitor the tasks of the 7 divisions, 20 teams and 340 different IT services. The software has been in use since December 2016.
Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach

PubMed Central

Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

2012-01-01

Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881
Conceptual-level workflow modeling of scientific experiments using NMR as a case study

PubMed Central

Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R

2007-01-01

Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment. PMID:17263870
Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

PubMed

Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R

2007-01-30

Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
A scientific workflow framework for (13)C metabolic flux analysis.

PubMed

Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

2016-08-20

Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
Tools for monitoring system suitability in LC MS/MS centric proteomic experiments.

PubMed

Bereman, Michael S

2015-03-01

With advances in liquid chromatography coupled to tandem mass spectrometry technologies combined with the continued goals of biomarker discovery, clinical applications of established biomarkers, and integrating large multiomic datasets (i.e. "big data"), there remains an urgent need for robust tools to assess instrument performance (i.e. system suitability) in proteomic workflows. To this end, several freely available tools have been introduced that monitor a number of peptide identification (ID) and/or peptide ID free metrics. Peptide ID metrics include numbers of proteins, peptides, or peptide spectral matches identified from a complex mixture. Peptide ID free metrics include retention time reproducibility, full width half maximum, ion injection times, and integrated peptide intensities. The main driving force in the development of these tools is to monitor both intra- and interexperiment performance variability and to identify sources of variation. The purpose of this review is to summarize and evaluate these tools based on versatility, automation, vendor neutrality, metrics monitored, and visualization capabilities. In addition, the implementation of a robust system suitability workflow is discussed in terms of metrics, type of standard, and frequency of evaluation along with the obstacles to overcome prior to incorporating a more proactive approach to overall quality control in liquid chromatography coupled to tandem mass spectrometry based proteomic workflows. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using aerial images for establishing a workflow for the quantification of water management measures

NASA Astrophysics Data System (ADS)

Leuschner, Annette; Merz, Christoph; van Gasselt, Stephan; Steidl, Jörg

2017-04-01

Quantified landscape characteristics, such as morphology, land use or hydrological conditions, play an important role for hydrological investigations as landscape parameters directly control the overall water balance. A powerful assimilation and geospatial analysis of remote sensing datasets in combination with hydrological modeling allows to quantify landscape parameters and water balances efficiently. This study focuses on the development of a workflow to extract hydrologically relevant data from aerial image datasets and derived products in order to allow an effective parametrization of a hydrological model. Consistent and self-contained data source are indispensable for achieving reasonable modeling results. In order to minimize uncertainties and inconsistencies, input parameters for modeling should be extracted from one remote-sensing dataset mainly if possbile. Here, aerial images have been chosen because of their high spatial and spectral resolution that permits the extraction of various model relevant parameters, like morphology, land-use or artificial drainage-systems. The methodological repertoire to extract environmental parameters range from analyses of digital terrain models, multispectral classification and segmentation of land use distribution maps and mapping of artificial drainage-systems based on spectral and visual inspection. The workflow has been tested for a mesoscale catchment area which forms a characteristic hydrological system of a young moraine landscape located in the state of Brandenburg, Germany. These dataset were used as input-dataset for multi-temporal hydrological modelling of water balances to detect and quantify anthropogenic and meteorological impacts. ArcSWAT, as a GIS-implemented extension and graphical user input interface for the Soil Water Assessment Tool (SWAT) was chosen. The results of this modeling approach provide the basis for anticipating future development of the hydrological system, and regarding system changes for the adaption of water resource management decisions.
SECIMTools: a suite of metabolomics data analysis tools.

PubMed

Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M

2018-04-20

Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE PAGES

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

2018-03-19

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The personal health record: consumers banking on their health.

PubMed

Ball, Marion J; Costin, Melinda Y; Lehmann, Christoph

2008-01-01

With personal health records (PHRs) acting much like ATM cards, increasingly wired consumers can "bank on health", accessing their own personal health information and a wide array of services. Consumer-owned, the PHR is dependent upon the existence of the legal electronic medical record (EMR) and interoperability. Working PHRs are in place in Veterans Health Administration, private health care institutions, and in the commercial sector. By allowing consumers to become involved in their own care, the PHR creates new roles and relationships. New tools change the clinician's workflow and thought flow, and pose new challenges for consumers. Key components of the PHR include the EMR and regional health information organizations (RHIOs); key strategies focus on human factors in successful project management. Online resources provided by the National Library of Medicine and Health On the Net help address consumer needs for information that is reliable and understandable. The growth of self-management tools adds to the challenge and the promise of PHRs for clinicians and consumers alike.
Severe community-acquired pneumonia: timely management measures in the first 24 hours.

PubMed

Phua, Jason; Dean, Nathan C; Guo, Qi; Kuan, Win Sen; Lim, Hui Fang; Lim, Tow Keang

2016-08-28

Mortality rates for severe community-acquired pneumonia (CAP) range from 17 to 48 % in published studies.In this review, we searched PubMed for relevant papers published between 1981 and June 2016 and relevant files. We explored how early and aggressive management measures, implemented within 24 hours of recognition of severe CAP and carried out both in the emergency department and in the ICU, decrease mortality in severe CAP.These measures begin with the use of severity assessment tools and the application of care bundles via clinical decision support tools. The bundles include early guideline-concordant antibiotics including macrolides, early haemodynamic support (lactate measurement, intravenous fluids, and vasopressors), and early respiratory support (high-flow nasal cannulae, lung-protective ventilation, prone positioning, and neuromuscular blockade for acute respiratory distress syndrome).While the proposed interventions appear straightforward, multiple barriers to their implementation exist. To successfully decrease mortality for severe CAP, early and close collaboration between emergency medicine and respiratory and critical care medicine teams is required. We propose a workflow incorporating these interventions.
A Cloud-based Infrastructure and Architecture for Environmental System Research

NASA Astrophysics Data System (ADS)

Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.

2016-12-01

The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Large scale and cloud-based multi-model analytics experiments on climate change data in the Earth System Grid Federation

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Płóciennik, Marcin; Doutriaux, Charles; Blanquer, Ignacio; Barbera, Roberto; Donvito, Giacinto; Williams, Dean N.; Anantharaj, Valentine; Salomoni, Davide D.; Aloisio, Giovanni

2017-04-01

In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated, such as the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). A case study on climate models intercomparison data analysis addressing several classes of multi-model experiments is being implemented in the context of the EU H2020 INDIGO-DataCloud project. Such experiments require the availability of large amount of data (multi-terabyte order) related to the output of several climate models simulations as well as the exploitation of scientific data management tools for large-scale data analytics. More specifically, the talk discusses in detail a use case on precipitation trend analysis in terms of requirements, architectural design solution, and infrastructural implementation. The experiment has been tested and validated on CMIP5 datasets, in the context of a large scale distributed testbed across EU and US involving three ESGF sites (LLNL, ORNL, and CMCC) and one central orchestrator site (PSNC). The general "environment" of the case study relates to: (i) multi-model data analysis inter-comparison challenges; (ii) addressed on CMIP5 data; and (iii) which are made available through the IS-ENES/ESGF infrastructure. The added value of the solution proposed in the INDIGO-DataCloud project are summarized in the following: (i) it implements a different paradigm (from client- to server-side); (ii) it intrinsically reduces data movement; (iii) it makes lightweight the end-user setup; (iv) it fosters re-usability (of data, final/intermediate products, workflows, sessions, etc.) since everything is managed on the server-side; (v) it complements, extends and interoperates with the ESGF stack; (vi) it provides a "tool" for scientists to run multi-model experiments, and finally; and (vii) it can drastically reduce the time-to-solution for these experiments from weeks to hours. At the time the contribution is being written, the proposed testbed represents the first concrete implementation of a distributed multi-model experiment in the ESGF/CMIP context joining server-side and parallel processing, end-to-end workflow management and cloud computing. As opposed to the current scenario based on search & discovery, data download, and client-based data analysis, the INDIGO-DataCloud architectural solution described in this contribution addresses the scientific computing & analytics requirements by providing a paradigm shift based on server-side and high performance big data frameworks jointly with two-level workflow management systems realized at the PaaS level via a cloud infrastructure.
Rethinking Clinical Workflow.

PubMed

Schlesinger, Joseph J; Burdick, Kendall; Baum, Sarah; Bellomy, Melissa; Mueller, Dorothee; MacDonald, Alistair; Chern, Alex; Chrouser, Kristin; Burger, Christie

2018-03-01

The concept of clinical workflow borrows from management and leadership principles outside of medicine. The only way to rethink clinical workflow is to understand the neuroscience principles that underlie attention and vigilance. With any implementation to improve practice, there are human factors that can promote or impede progress. Modulating the environment and working as a team to take care of patients is paramount. Clinicians must continually rethink clinical workflow, evaluate progress, and understand that other industries have something to offer. Then, novel approaches can be implemented to take the best care of patients. Copyright © 2017 Elsevier Inc. All rights reserved.
A Community-Driven Workflow Recommendations and Reuse Infrastructure

NASA Astrophysics Data System (ADS)

Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

2013-12-01

Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from usage history to help Earth scientists better understand existing artifacts and how to use them in a proper manner? R2: Informed by insights derived from their computing contexts, how could such hidden knowledge be used to facilitate artifact reuse by Earth scientists? Our study of the two research questions will provide answers to three technical questions aiming to assist NEX users during workflow development: 1) How to determine what topics interest the researcher? 2) How to find appropriate artifacts? and 3) How to advise the researcher in artifact reuse? In this paper, we report our on-going efforts of leveraging social networking theory and analysis techniques to provide dynamic advice on artifact reuse to NEX users based on their surrounding contexts. As a proof of concept, we have designed and developed a plug-in to the VisTrails workflow design tool. When users develop workflows using VisTrails, our plug-in will proactively recommend most relevant sub-workflows to the users.
A comprehensive quality control workflow for paired tumor-normal NGS experiments.

PubMed

Schroeder, Christopher M; Hilke, Franz J; Löffler, Markus W; Bitzer, Michael; Lenz, Florian; Sturm, Marc

2017-06-01

Quality control (QC) is an important part of all NGS data analysis stages. Many available tools calculate QC metrics from different analysis steps of single sample experiments (raw reads, mapped reads and variant lists). Multi-sample experiments, as sequencing of tumor-normal pairs, require additional QC metrics to ensure validity of results. These multi-sample QC metrics still lack standardization. We therefore suggest a new workflow for QC of DNA sequencing of tumor-normal pairs. With this workflow well-known single-sample QC metrics and additional metrics specific for tumor-normal pairs can be calculated. The segmentation into different tools offers a high flexibility and allows reuse for other purposes. All tools produce qcML, a generic XML format for QC of -omics experiments. qcML uses quality metrics defined in an ontology, which was adapted for NGS. All QC tools are implemented in C ++ and run both under Linux and Windows. Plotting requires python 2.7 and matplotlib. The software is available under the 'GNU General Public License version 2' as part of the ngs-bits project: https://github.com/imgag/ngs-bits. christopher.schroeder@med.uni-tuebingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A Workflow-based Intelligent Network Data Movement Advisor with End-to-end Performance Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Michelle M.; Wu, Chase Q.

2013-11-07

Next-generation eScience applications often generate large amounts of simulation, experimental, or observational data that must be shared and managed by collaborative organizations. Advanced networking technologies and services have been rapidly developed and deployed to facilitate such massive data transfer. However, these technologies and services have not been fully utilized mainly because their use typically requires significant domain knowledge and in many cases application users are even not aware of their existence. By leveraging the functionalities of an existing Network-Aware Data Movement Advisor (NADMA) utility, we propose a new Workflow-based Intelligent Network Data Movement Advisor (WINDMA) with end-to-end performance optimization formore » this DOE funded project. This WINDMA system integrates three major components: resource discovery, data movement, and status monitoring, and supports the sharing of common data movement workflows through account and database management. This system provides a web interface and interacts with existing data/space management and discovery services such as Storage Resource Management, transport methods such as GridFTP and GlobusOnline, and network resource provisioning brokers such as ION and OSCARS. We demonstrate the efficacy of the proposed transport-support workflow system in several use cases based on its implementation and deployment in DOE wide-area networks.« less
Taking advantage of HTML5 browsers to realize the concepts of session state and workflow sharing in web-tool applications

NASA Astrophysics Data System (ADS)

Suftin, I.; Read, J. S.; Walker, J.

2013-12-01

Scientists prefer not having to be tied down to a specific machine or operating system in order to analyze local and remote data sets or publish work. Increasingly, analysis has been migrating to decentralized web services and data sets, using web clients to provide the analysis interface. While simplifying workflow access, analysis, and publishing of data, the move does bring with it its own unique set of issues. Web clients used for analysis typically offer workflows geared towards a single user, with steps and results that are often difficult to recreate and share with others. Furthermore, workflow results often may not be easily used as input for further analysis. Older browsers further complicate things by having no way to maintain larger chunks of information, often offloading the job of storage to the back-end server or trying to squeeze it into a cookie. It has been difficult to provide a concept of "session storage" or "workflow sharing" without a complex orchestration of the back-end for storage depending on either a centralized file system or database. With the advent of HTML5, browsers gained the ability to store more information through the use of the Web Storage API (a browser-cookie holds a maximum of 4 kilobytes). Web Storage gives us the ability to store megabytes of arbitrary data in-browser either with an expiration date or just for a session. This allows scientists to create, update, persist and share their workflow without depending on the backend to store session information, providing the flexibility for new web-based workflows to emerge. In the DSASWeb portal ( http://cida.usgs.gov/DSASweb/ ), using these techniques, the representation of every step in the analyst's workflow is stored as plain-text serialized JSON, which we can generate as a text file and provide to the analyst as an upload. This file may then be shared with others and loaded back into the application, restoring the application to the state it was in when the session file was generated. A user may then view results produced during that session or go back and alter input parameters, creating new results and producing new, unique sessions which they can then again share. This technique not only provides independence for the user to manage their session as they like, but also allows much greater freedom for the application provider to scale out without having to worry about carrying over user information or maintaining it in a central location.

Early experiences in developing and managing the neuroscience gateway.

PubMed

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T

2015-02-01

The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway

PubMed Central

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.

2015-01-01

SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
a Restoration Oriented Hbim System for Cultural Heritage Documentation: the Case Study of Parma Cathedral

NASA Astrophysics Data System (ADS)

Bruno, N.; Roncella, R.

2018-05-01

The need to safeguard and preserve Cultural Heritage (CH) is increasing and especially in Italy, where the amount of historical buildings is considerable, having efficient and standardized processes of CH management and conservation becomes strategic. At the time being, there are no tools capable of fulfilling all the specific functions required by Cultural Heritage documentation and, due to the complexity of historical assets, there are no solution as flexible and customizable as CH specific needs require. Nevertheless, BIM methodology can represent the most effective solution, on condition that proper methodologies, tools and functions are made available. The paper describes an ongoing research on the implementation of a Historical BIM system for the Parma cathedral, aimed at the maintenance, conservation and restoration. Its main goal was to give a concrete answer to the lack of specific tools required by Cultural Heritage documentation: organized and coordinated storage and management of historical data, easy analysis and query, time management, 3D modelling of irregular shapes, flexibility, user-friendliness, etc. The paper will describe the project and the implemented methodology, focusing mainly on survey and modelling phases. In describing the methodology, critical issues about the creation of a HBIM will be highlighted, trying to outline a workflow applicable also in other similar contexts.
Optimizing high performance computing workflow for protein functional annotation.

PubMed

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-09-10

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation

PubMed Central

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-01-01

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

DOE PAGES

Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...

2015-01-01

This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less
Digitization workflows for flat sheets and packets of plants, algae, and fungi1

PubMed Central

Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.

2015-01-01

Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256
Bioinformatics resource manager v2.3: an integrated software environment for systems biology with microRNA and cross-species analysis tools

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are noncoding RNAs that direct post-transcriptional regulation of protein coding genes. Recent studies have shown miRNAs are important for controlling many biological processes, including nervous system development, and are highly conserved across species. Given their importance, computational tools are necessary for analysis, interpretation and integration of high-throughput (HTP) miRNA data in an increasing number of model species. The Bioinformatics Resource Manager (BRM) v2.3 is a software environment for data management, mining, integration and functional annotation of HTP biological data. In this study, we report recent updates to BRM for miRNA data analysis and cross-species comparisons across datasets. Results BRM v2.3 has the capability to query predicted miRNA targets from multiple databases, retrieve potential regulatory miRNAs for known genes, integrate experimentally derived miRNA and mRNA datasets, perform ortholog mapping across species, and retrieve annotation and cross-reference identifiers for an expanded number of species. Here we use BRM to show that developmental exposure of zebrafish to 30 uM nicotine from 6–48 hours post fertilization (hpf) results in behavioral hyperactivity in larval zebrafish and alteration of putative miRNA gene targets in whole embryos at developmental stages that encompass early neurogenesis. We show typical workflows for using BRM to integrate experimental zebrafish miRNA and mRNA microarray datasets with example retrievals for zebrafish, including pathway annotation and mapping to human ortholog. Functional analysis of differentially regulated (p<0.05) gene targets in BRM indicates that nicotine exposure disrupts genes involved in neurogenesis, possibly through misregulation of nicotine-sensitive miRNAs. Conclusions BRM provides the ability to mine complex data for identification of candidate miRNAs or pathways that drive phenotypic outcome and, therefore, is a useful hypothesis generation tool for systems biology. The miRNA workflow in BRM allows for efficient processing of multiple miRNA and mRNA datasets in a single software environment with the added capability to interact with public data sources and visual analytic tools for HTP data analysis at a systems level. BRM is developed using Java™ and other open-source technologies for free distribution (http://www.sysbio.org/dataresources/brm.stm). PMID:23174015
A Comprehensive Automated 3D Approach for Building Extraction, Reconstruction, and Regularization from Airborne Laser Scanning Point Clouds

PubMed Central

Dorninger, Peter; Pfeifer, Norbert

2008-01-01

Three dimensional city models are necessary for supporting numerous management applications. For the determination of city models for visualization purposes, several standardized workflows do exist. They are either based on photogrammetry or on LiDAR or on a combination of both data acquisition techniques. However, the automated determination of reliable and highly accurate city models is still a challenging task, requiring a workflow comprising several processing steps. The most relevant are building detection, building outline generation, building modeling, and finally, building quality analysis. Commercial software tools for building modeling require, generally, a high degree of human interaction and most automated approaches described in literature stress the steps of such a workflow individually. In this article, we propose a comprehensive approach for automated determination of 3D city models from airborne acquired point cloud data. It is based on the assumption that individual buildings can be modeled properly by a composition of a set of planar faces. Hence, it is based on a reliable 3D segmentation algorithm, detecting planar faces in a point cloud. This segmentation is of crucial importance for the outline detection and for the modeling approach. We describe the theoretical background, the segmentation algorithm, the outline detection, and the modeling approach, and we present and discuss several actual projects. PMID:27873931
Digital Workflows for Restoration and Management of the Museum Affandi - a Case Study in Challenging Circumstances

NASA Astrophysics Data System (ADS)

Herbig, U.; Styhler-Aydın, G.; Grandits, D.; Stampfer, L.; Pont, U.; Mayer, I.

2017-08-01

The appropriate restoration of architectural heritage needs a careful and comprehensive documentation of the existing structures, which even elaborates, if the function of the building needs special attention, like in museums. In a collaborative project between the Universitas Gadjah Mada, Yogyakarta, Indonesia and two universities in Austria (TU Wien and the Danube University Krems) a restoration and adaptation concept of the Affandi Museum in Yogyakarta is currently in progress. It provides a perfect case study for the development of a workflow to combine data from a building survey, architectural research, indoor climate measurements and the documentation of artwork in a challenging environment, from hot and humid tropical climate to continuous threads by natural hazards like earthquakes or volcanic eruptions. The Affandi Museum houses the collection of Affandi, who is considered to be Indonesia's foremost Expressionist painter and partly designed and constructed the museum by himself. With the spirit of the artist still perceptible in the complex the Affandi Museum is an important part of the Indonesian cultural heritage. Thus its preservation takes special attention and adds to the complexity of the development of a monitoring and maintenance concept. This paper describes the ongoing development of an approach to a workflow from the measurement and research of the objects, both architectural and artwork, to the semantically enriched BIM Model as the base for a sustainable monitoring tool for the Affandi Museum.
Building an efficient curation workflow for the Arabidopsis literature corpus

PubMed Central

Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva

2012-01-01

TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
ballaxy: web services for structural bioinformatics.

PubMed

Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

2015-01-01

Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling. ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A Multiagent Modeling Environment for Simulating Work Practice in Organizations

NASA Technical Reports Server (NTRS)

Sierhuis, Maarten; Clancey, William J.; vanHoof, Ron

2004-01-01

In this paper we position Brahms as a tool for simulating organizational processes. Brahms is a modeling and simulation environment for analyzing human work practice, and for using such models to develop intelligent software agents to support the work practice in organizations. Brahms is the result of more than ten years of research at the Institute for Research on Learning (IRL), NYNEX Science & Technology (the former R&D institute of the Baby Bell telephone company in New York, now Verizon), and for the last six years at NASA Ames Research Center, in the Work Systems Design and Evaluation group, part of the Computational Sciences Division (Code IC). Brahms has been used on more than ten modeling and simulation research projects, and recently has been used as a distributed multiagent development environment for developing work practice support tools for human in-situ science exploration on planetary surfaces, in particular a human mission to Mars. Brahms was originally conceived of as a business process modeling and simulation tool that incorporates the social systems of work, by illuminating how formal process flow descriptions relate to people s actual located activities in the workplace. Our research started in the early nineties as a reaction to experiences with work process modeling and simulation . Although an effective tool for convincing management of the potential cost-savings of the newly designed work processes, the modeling and simulation environment was only able to describe work as a normative workflow. However, the social systems, uncovered in work practices studied by the design team played a significant role in how work actually got done-actual lived work. Multi- tasking, informal assistance and circumstantial work interactions could not easily be represented in a tool with a strict workflow modeling paradigm. In response, we began to develop a tool that would have the benefits of work process modeling and simulation, but be distinctively able to represent the relations of people, locations, systems, artifacts, communication and information content.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Dean N.

2011-07-20

This report summarizes work carried out by the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Team for the period of January 1, 2011 through June 30, 2011. It discusses highlights, overall progress, period goals, and collaborations and lists papers and presentations. To learn more about our project, please visit our UV-CDAT website (URL: http://uv-cdat.org). This report will be forwarded to the program manager for the Department of Energy (DOE) Office of Biological and Environmental Research (BER), national and international collaborators and stakeholders, and to researchers working on a wide range of other climate model, reanalysis, and observation evaluation activities. Themore » UV-CDAT executive committee consists of Dean N. Williams of Lawrence Livermore National Laboratory (LLNL); Dave Bader and Galen Shipman of Oak Ridge National Laboratory (ORNL); Phil Jones and James Ahrens of Los Alamos National Laboratory (LANL), Claudio Silva of Polytechnic Institute of New York University (NYU-Poly); and Berk Geveci of Kitware, Inc. The UV-CDAT team consists of researchers and scientists with diverse domain knowledge whose home institutions also include the National Aeronautics and Space Administration (NASA) and the University of Utah. All work is accomplished under DOE open-source guidelines and in close collaboration with the project's stakeholders, domain researchers, and scientists. Working directly with BER climate science analysis projects, this consortium will develop and deploy data and computational resources useful to a wide variety of stakeholders, including scientists, policymakers, and the general public. Members of this consortium already collaborate with other institutions and universities in researching data discovery, management, visualization, workflow analysis, and provenance. The UV-CDAT team will address the following high-level visualization requirements: (1) Alternative parallel streaming statistics and analysis pipelines - Data parallelism, Task parallelism, Visualization parallelism; (2) Optimized parallel input/output (I/O); (3) Remote interactive execution; (4) Advanced intercomparison visualization; (5) Data provenance processing and capture; and (6) Interfaces for scientists - Workflow data analysis and visualization construction tools, and Visualization interfaces.« less
P19-S Managing Proteomics Data from Data Generation and Data Warehousing to Central Data Repository and Journal Reviewing Processes

PubMed Central

Thiele, H.; Glandorf, J.; Koerting, G.; Reidegeld, K.; Blüggel, M.; Meyer, H.; Stephan, C.

2007-01-01

In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE. ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list. The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format. The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
Python Processing and Version Control using VisTrails for the Netherlands Hydrological Instrument (Invited)

NASA Astrophysics Data System (ADS)

Verkaik, J.

2013-12-01

The Netherlands Hydrological Instrument (NHI) model predicts water demands in periods of drought, supporting the Dutch decision makers in taking operational as well as long-term decisions with respect to the water supply. Other applications of NHI are predicting fresh-salt interaction, nutrient loadings, and agriculture change. The NHI model consists of several coupled models: a saturated groundwater model (MODFLOW), an unsaturated groundwater model (MetaSWAP), a sub-catchment surface water model (MOZART), and a distribution network of surface waters model (DM/SOBEK). Each of these models requires specific, usually large, input data that may be the result of sophisticated schematization workflows. Input data can also be dependent on each other, for example, the precipitation data is input for the unsaturated zone model (cells) as well as for the surface water models (polygons). For efficient data management, we developed several Python tools such that the modeler or stakeholder can use the model in a user-friendly manner, and data is managed in a consistent, transparent and reproducible way. Two open source Python tools are presented here: the data version control module for the workflow manager VisTrails called FileSync, and the NHI model control script that uses FileSync. VisTrails is an open-source scientific workflow and provenance management system that provides support for simulations, data exploration and visualization. Since VisTrails does not directly support version control we developed a version control module called FileSync. With this generic module, the user can synchronize data from and to his workflow through a dialog window. The FileSync dialog calls the FileSync script that is command-line based and performs the actual data synchronization. This script allows the user to easily create a model repository, upload and download data, create releases and define scenarios. The data synchronization approach applied here differs from systems as Subversion or Git, since these systems do not perform well for large (binary) model data files. For this reason, a new concept of parameterization and data splitting has been implemented. Each file, or set of files, is uniquely labeled as a parameter, and for this parameter metadata is maintained by Subversion. The metadata data contains file hashes to identify data content and the location where the actual bulk data are stored that can be reached by FTP. The NHI model control script is a command-line driven Python script for pre-processing, running, and post-processing the NHI model and uses one single configuration file for all computational kernels. This configuration file is an easy-to-use, keyword-driven, Windows INI-file, having separate sections for all the kernels. It also includes a FileSync data section where the user can specify version controlled model data to be used as input. The NHI control script keeps all the data consistent during the pre-processing. Furthermore, this script is able to do model state handling when the NHI model is used for ensemble forecasting.
A Drupal-Based Collaborative Framework for Science Workflows

NASA Astrophysics Data System (ADS)

Pinheiro da Silva, P.; Gandara, A.

2010-12-01

Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
An Efficient Workflow Environment to Support the Collaborative Development of Actionable Climate Information Using the NCAR Climate Risk Management Engine (CRMe)

NASA Astrophysics Data System (ADS)

Ammann, C. M.; Vigh, J. L.; Lee, J. A.

2016-12-01

Society's growing needs for robust and relevant climate information have fostered an explosion in tools and frameworks for processing climate projections. Many top-down workflows might be employed to generate sets of pre-computed data and plots, frequently served in a "loading-dock style" through a metadata-enabled search and discovery engine. Despite these increasing resources, the diverse needs of applications-driven projects often result in data processing workflow requirements that cannot be fully satisfied using past approaches. In parallel to the data processing challenges, the provision of climate information to users in a form that is also usable represents a formidable challenge of its own. Finally, many users do not have the time nor the desire to synthesize and distill massive volumes of climate information to find the relevant information for their particular application. All of these considerations call for new approaches to developing actionable climate information. CRMe seeks to bridge the gap between the diversity and richness of bottom-up needs of practitioners, with discrete, structured top-down workflows typically implemented for rapid delivery. Additionally, CRMe has implemented web-based data services capable of providing focused climate information in usable form for a given location, or as spatially aggregated information for entire regions or countries following the needs of users and sectors. Making climate data actionable also involves summarizing and presenting it in concise and approachable ways. CRMe is developing the concept of dashboards, co-developed with the users, to condense the key information into a quick summary of the most relevant, curated climate data for a given discipline, application, or location, while still enabling users to efficiently conduct deeper discovery into rich datasets on an as-needed basis.
Producing an Infrared Multiwavelength Galactic Plane Atlas Using Montage, Pegasus, and Amazon Web Services

NASA Astrophysics Data System (ADS)

Rynge, M.; Juve, G.; Kinney, J.; Good, J.; Berriman, B.; Merrihew, A.; Deelman, E.

2014-05-01

In this paper, we describe how to leverage cloud resources to generate large-scale mosaics of the galactic plane in multiple wavelengths. Our goal is to generate a 16-wavelength infrared Atlas of the Galactic Plane at a common spatial sampling of 1 arcsec, processed so that they appear to have been measured with a single instrument. This will be achieved by using the Montage image mosaic engine process observations from the 2MASS, GLIMPSE, MIPSGAL, MSX and WISE datasets, over a wavelength range of 1 μm to 24 μm, and by using the Pegasus Workflow Management System for managing the workload. When complete, the Atlas will be made available to the community as a data product. We are generating images that cover ±180° in Galactic longitude and ±20° in Galactic latitude, to the extent permitted by the spatial coverage of each dataset. Each image will be 5°x5° in size (including an overlap of 1° with neighboring tiles), resulting in an atlas of 1,001 images. The final size will be about 50 TBs. This paper will focus on the computational challenges, solutions, and lessons learned in producing the Atlas. To manage the computation we are using the Pegasus Workflow Management System, a mature, highly fault-tolerant system now in release 4.2.2 that has found wide applicability across many science disciplines. A scientific workflow describes the dependencies between the tasks and in most cases the workflow is described as a directed acyclic graph, where the nodes are tasks and the edges denote the task dependencies. A defining property for a scientific workflow is that it manages data flow between tasks. Applied to the galactic plane project, each 5 by 5 mosaic is a Pegasus workflow. Pegasus is used to fetch the source images, execute the image mosaicking steps of Montage, and store the final outputs in a storage system. As these workflows are very I/O intensive, care has to be taken when choosing what infrastructure to execute the workflow on. In our setup, we choose to use dynamically provisioned compute clusters running on the Amazon Elastic Compute Cloud (EC2). All our instances are using the same base image, which is configured to come up as a master node by default. The master node is a central instance from where the workflow can be managed. Additional worker instances are provisioned and configured to accept work assignments from the master node. The system allows for adding/removing workers in an ad hoc fashion, and could be run in large configurations. To-date we have performed 245,000 CPU hours of computing and generated 7,029 images and totaling 30 TB. With the current set up our runtime would be 340,000 CPU hours for the whole project. Using spot m2.4xlarge instances, the cost would be approximately $5,950. Using faster AWS instances, such as cc2.8xlarge could potentially decrease the total CPU hours and further reduce the compute costs. The paper will explore these tradeoffs.
End-to-end interoperability and workflows from building architecture design to one or more simulations

DOEpatents

Chao, Tian-Jy; Kim, Younghun

2015-02-10

An end-to-end interoperability and workflows from building architecture design to one or more simulations, in one aspect, may comprise establishing a BIM enablement platform architecture. A data model defines data entities and entity relationships for enabling the interoperability and workflows. A data definition language may be implemented that defines and creates a table schema of a database associated with the data model. Data management services and/or application programming interfaces may be implemented for interacting with the data model. Web services may also be provided for interacting with the data model via the Web. A user interface may be implemented that communicates with users and uses the BIM enablement platform architecture, the data model, the data definition language, data management services and application programming interfaces to provide functions to the users to perform work related to building information management.

Identification and Management of Information Problems by Emergency Department Staff

PubMed Central

Murphy, Alison R.; Reddy, Madhu C.

2014-01-01

Patient-care teams frequently encounter information problems during their daily activities. These information problems include wrong, outdated, conflicting, incomplete, or missing information. Information problems can negatively impact the patient-care workflow, lead to misunderstandings about patient information, and potentially lead to medical errors. Existing research focuses on understanding the cause of these information problems and the impact that they can have on the hospital’s workflow. However, there is limited research on how patient-care teams currently identify and manage information problems that they encounter during their work. Through qualitative observations and interviews in an emergency department (ED), we identified the types of information problems encountered by ED staff, and examined how they identified and managed the information problems. We also discuss the impact that these information problems can have on the patient-care teams, including the cascading effects of information problems on workflow and the ambiguous accountability for fixing information problems within collaborative teams. PMID:25954457
Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

NASA Astrophysics Data System (ADS)

Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

2014-05-01

Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.
Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow

DOE PAGES

Brunk, Elizabeth; George, Kevin W.; Alonso-Gutierrez, Jorge; ...

2016-05-19

Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proofmore » of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.« less
Barriers to critical thinking: workflow interruptions and task switching among nurses.

PubMed

Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin

2011-10-01

Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.
Advantages and Disadvantages of 1-Incision, 2-Incision, 3-Incision, and 4-Incision Laparoscopic Cholecystectomy: A Workflow Comparison Study.

PubMed

Bartnicka, Joanna; Zietkiewicz, Agnieszka A; Kowalski, Grzegorz J

2016-08-01

A comparison of 1-port, 2-port, 3-port, and 4-port laparoscopic cholecystectomy techniques from the point of view of workflow criteria was made to both identify specific workflow components that can cause surgical disturbances and indicate good and bad practices. As a case study, laparoscopic cholecystectomies, including manual tasks and interactions within teamwork members, were video-recorded and analyzed on the basis of specially encoded workflow information. The parameters for comparison were defined as follows: surgery time, tool and hand activeness, operator's passive work, collisions, and operator interventions. It was found that 1-port cholecystectomy is the worst technique because of nonergonomic body position, technical complexity, organizational anomalies, and operational dynamism. The differences between laparoscopic techniques are closely linked to the costs of the medical procedures. Hence, knowledge about the surgical workflow can be used for both planning surgical procedures and balancing the expenses associated with surgery.
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources

PubMed Central

Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Genomic Tools in Cowpea Breeding Programs: Status and Perspectives

PubMed Central

Boukar, Ousmane; Fatokun, Christian A.; Huynh, Bao-Lam; Roberts, Philip A.; Close, Timothy J.

2016-01-01

Cowpea is one of the most important grain legumes in sub-Saharan Africa (SSA). It provides strong support to the livelihood of small-scale farmers through its contributions to their nutritional security, income generation and soil fertility enhancement. Worldwide about 6.5 million metric tons of cowpea are produced annually on about 14.5 million hectares. The low productivity of cowpea is attributable to numerous abiotic and biotic constraints. The abiotic stress factors comprise drought, low soil fertility, and heat while biotic constraints include insects, diseases, parasitic weeds, and nematodes. Cowpea farmers also have limited access to quality seeds of improved varieties for planting. Some progress has been made through conventional breeding at international and national research institutions in the last three decades. Cowpea improvement could also benefit from modern breeding methods based on molecular genetic tools. A number of advances in cowpea genetic linkage maps, and quantitative trait loci associated with some desirable traits such as resistance to Striga, Macrophomina, Fusarium wilt, bacterial blight, root-knot nematodes, aphids, and foliar thrips have been reported. An improved consensus genetic linkage map has been developed and used to identify QTLs of additional traits. In order to take advantage of these developments single nucleotide polymorphism (SNP) genotyping is being streamlined to establish an efficient workflow supported by genotyping support service (GSS)-client interactions. About 1100 SNPs mapped on the cowpea genome were converted by LGC Genomics to KASP assays. Several cowpea breeding programs have been exploiting these resources to implement molecular breeding, especially for MARS and MABC, to accelerate cowpea variety improvement. The combination of conventional breeding and molecular breeding strategies, with workflow managed through the CGIAR breeding management system (BMS), promises an increase in the number of improved varieties available to farmers, thereby boosting cowpea production and productivity in SSA. PMID:27375632
An integrated workflow for analysis of ChIP-chip data.

PubMed

Weigelt, Karin; Moehle, Christoph; Stempfl, Thomas; Weber, Bernhard; Langmann, Thomas

2008-08-01

Although ChIP-chip is a powerful tool for genome-wide discovery of transcription factor target genes, the steps involving raw data analysis, identification of promoters, and correlation with binding sites are still laborious processes. Therefore, we report an integrated workflow for the analysis of promoter tiling arrays with the Genomatix ChipInspector system. We compare this tool with open-source software packages to identify PU.1 regulated genes in mouse macrophages. Our results suggest that ChipInspector data analysis, comparative genomics for binding site prediction, and pathway/network modeling significantly facilitate and enhance whole-genome promoter profiling to reveal in vivo sites of transcription factor-DNA interactions.
I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chard, Kyle; D'Arcy, Mike; Heavner, Benjamin D.

Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and toolsmore » for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.« less
Big data analytics in immunology: a knowledge-based approach.

PubMed

Zhang, Guang Lan; Sun, Jing; Chitkushev, Lou; Brusic, Vladimir

2014-01-01

With the vast amount of immunological data available, immunology research is entering the big data era. These data vary in granularity, quality, and complexity and are stored in various formats, including publications, technical reports, and databases. The challenge is to make the transition from data to actionable knowledge and wisdom and bridge the knowledge gap and application gap. We report a knowledge-based approach based on a framework called KB-builder that facilitates data mining by enabling fast development and deployment of web-accessible immunological data knowledge warehouses. Immunological knowledge discovery relies heavily on both the availability of accurate, up-to-date, and well-organized data and the proper analytics tools. We propose the use of knowledge-based approaches by developing knowledgebases combining well-annotated data with specialized analytical tools and integrating them into analytical workflow. A set of well-defined workflow types with rich summarization and visualization capacity facilitates the transformation from data to critical information and knowledge. By using KB-builder, we enabled streamlining of normally time-consuming processes of database development. The knowledgebases built using KB-builder will speed up rational vaccine design by providing accurate and well-annotated data coupled with tailored computational analysis tools and workflow.
A simple tool for stereological assessment of digital images: the STEPanizer.

PubMed

Tschanz, S A; Burri, P H; Weibel, E R

2011-07-01

STEPanizer is an easy-to-use computer-based software tool for the stereological assessment of digitally captured images from all kinds of microscopical (LM, TEM, LSM) and macroscopical (radiology, tomography) imaging modalities. The program design focuses on providing the user a defined workflow adapted to most basic stereological tasks. The software is compact, that is user friendly without being bulky. STEPanizer comprises the creation of test systems, the appropriate display of digital images with superimposed test systems, a scaling facility, a counting module and an export function for the transfer of results to spreadsheet programs. Here we describe the major workflow of the tool illustrating the application on two examples from transmission electron microscopy and light microscopy, respectively. © 2011 The Authors Journal of Microscopy © 2011 Royal Microscopical Society.
Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

DOE PAGES

Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...

2016-10-06

The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Business intelligence for the radiologist: making your data work for you.

PubMed

Cook, Tessa S; Nagy, Paul

2014-12-01

Although it remains absent from most programs today, business intelligence (BI) has become an integral part of modern radiology practice management. BI facilitates the transition away from lack of understanding about a system and the data it produces toward incrementally more sophisticated comprehension of what has happened, could happen, and should happen. The individual components that make up BI are common across industries and include data extraction and transformation, process analysis and improvement, outcomes measures, performance assessment, graphical dashboarding, alerting, workflow analysis, and scenario modeling. As in other fields, these components can be directly applied in radiology to improve workflow, throughput, safety, efficacy, outcomes, and patient satisfaction. When approaching the subject of BI in radiology, it is important to know what data are available in your various electronic medical records, as well as where and how they are stored. In addition, it is critical to verify that the data actually represent what you think they do. Finally, it is critical for success to identify the features and limitations of the BI tools you choose to use and to plan your practice modifications on the basis of collected data. It is equally important to remember that BI plays a critical role in continuous process improvement; whichever BI tools you choose should be flexible to grow and evolve with your practice. Published by Elsevier Inc.
Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

NASA Astrophysics Data System (ADS)

Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

2007-12-01

Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
Creatiing a Collaborative Research Network for Scientists

NASA Astrophysics Data System (ADS)

Gunn, W.

2012-12-01

This abstract proposes a discussion of how professional science communication and scientific cooperation can become more efficient through the use of modern social network technology, using the example of Mendeley. Mendeley is a research workflow and collaboration tool which crowdsources real-time research trend information and semantic annotations of research papers in a central data store, thereby creating a "social research network" that is emergent from the research data added to the platform. We describe how Mendeley's model can overcome barriers for collaboration by turning research papers into social objects, making academic data publicly available via an open API, and promoting more efficient collaboration. Central to the success of Mendeley has been the creation of a tool that works for the researcher without the requirement of being part of an explicit social network. Mendeley automatically extracts metadata from research papers, and allows a researcher to annotate, tag and organize their research collection. The tool integrates with the paper writing workflow and provides advanced collaboration options, thus significantly improving researchers' productivity. By anonymously aggregating usage data, Mendeley enables the emergence of social metrics and real-time usage stats on top of the articles' abstract metadata. In this way a social network of collaborators, and people genuinely interested in content, emerges. By building this research network around the article as the social object, a social layer of direct relevance to academia emerges. As science, particularly Earth sciences with their large shared resources, become more and more global, the management and coordination of research is more and more dependent on technology to support these distributed collaborations.
Using NERSC High-Performance Computing (HPC) systems for high-energy nuclear physics applications with ALICE

NASA Astrophysics Data System (ADS)

Fasel, Markus

2016-10-01

High-Performance Computing Systems are powerful tools tailored to support large- scale applications that rely on low-latency inter-process communications to run efficiently. By design, these systems often impose constraints on application workflows, such as limited external network connectivity and whole node scheduling, that make more general-purpose computing tasks, such as those commonly found in high-energy nuclear physics applications, more difficult to carry out. In this work, we present a tool designed to simplify access to such complicated environments by handling the common tasks of job submission, software management, and local data management, in a framework that is easily adaptable to the specific requirements of various computing systems. The tool, initially constructed to process stand-alone ALICE simulations for detector and software development, was successfully deployed on the NERSC computing systems, Carver, Hopper and Edison, and is being configured to provide access to the next generation NERSC system, Cori. In this report, we describe the tool and discuss our experience running ALICE applications on NERSC HPC systems. The discussion will include our initial benchmarks of Cori compared to other systems and our attempts to leverage the new capabilities offered with Cori to support data-intensive applications, with a future goal of full integration of such systems into ALICE grid operations.

Knowledge as a Service at the Point of Care.

PubMed

Shellum, Jane L; Freimuth, Robert R; Peters, Steve G; Nishimura, Rick A; Chaudhry, Rajeev; Demuth, Steve J; Knopp, Amy L; Miksch, Timothy A; Milliner, Dawn S

2016-01-01

An electronic health record (EHR) can assist the delivery of high-quality patient care, in part by providing the capability for a broad range of clinical decision support, including contextual references (e.g., Infobuttons), alerts and reminders, order sets, and dashboards. All of these decision support tools are based on clinical knowledge; unfortunately, the mechanisms for managing rules, order sets, Infobuttons, and dashboards are often unrelated, making it difficult to coordinate the application of clinical knowledge to various components of the clinical workflow. Additional complexity is encountered when updating enterprise-wide knowledge bases and delivering the content through multiple modalities to different consumers. We present the experience of Mayo Clinic as a case study to examine the requirements and implementation challenges related to knowledge management across a large, multi-site medical center. The lessons learned through the development of our knowledge management and delivery platform will help inform the future development of interoperable knowledge resources.
Knowledge as a Service at the Point of Care

PubMed Central

Shellum, Jane L.; Freimuth, Robert R.; Peters, Steve G.; Nishimura, Rick A.; Chaudhry, Rajeev; Demuth, Steve J.; Knopp, Amy L.; Miksch, Timothy A.; Milliner, Dawn S.

2016-01-01

An electronic health record (EHR) can assist the delivery of high-quality patient care, in part by providing the capability for a broad range of clinical decision support, including contextual references (e.g., Infobuttons), alerts and reminders, order sets, and dashboards. All of these decision support tools are based on clinical knowledge; unfortunately, the mechanisms for managing rules, order sets, Infobuttons, and dashboards are often unrelated, making it difficult to coordinate the application of clinical knowledge to various components of the clinical workflow. Additional complexity is encountered when updating enterprise-wide knowledge bases and delivering the content through multiple modalities to different consumers. We present the experience of Mayo Clinic as a case study to examine the requirements and implementation challenges related to knowledge management across a large, multi-site medical center. The lessons learned through the development of our knowledge management and delivery platform will help inform the future development of interoperable knowledge resources. PMID:28269911
Video fingerprinting for copy identification: from research to industry applications

NASA Astrophysics Data System (ADS)

Lu, Jian

2009-02-01

Research that began a decade ago in video copy detection has developed into a technology known as "video fingerprinting". Today, video fingerprinting is an essential and enabling tool adopted by the industry for video content identification and management in online video distribution. This paper provides a comprehensive review of video fingerprinting technology and its applications in identifying, tracking, and managing copyrighted content on the Internet. The review includes a survey on video fingerprinting algorithms and some fundamental design considerations, such as robustness, discriminability, and compactness. It also discusses fingerprint matching algorithms, including complexity analysis, and approximation and optimization for fast fingerprint matching. On the application side, it provides an overview of a number of industry-driven applications that rely on video fingerprinting. Examples are given based on real-world systems and workflows to demonstrate applications in detecting and managing copyrighted content, and in monitoring and tracking video distribution on the Internet.
Modeling Business Processes in Public Administration

NASA Astrophysics Data System (ADS)

Repa, Vaclav

During more than 10 years of its existence business process modeling became a regular part of organization management practice. It is mostly regarded. as a part of information system development or even as a way to implement some supporting technology (for instance workflow system). Although I do not agree with such reduction of the real meaning of a business process, it is necessary to admit that information technologies play an essential role in business processes (see [1] for more information), Consequently, an information system is inseparable from a business process itself because it is a cornerstone of the general basic infrastructure of a business. This fact impacts on all dimensions of business process management. One of these dimensions is the methodology that postulates that the information systems development provide the business process management with exact methods and tools for modeling business processes. Also the methodology underlying the approach presented in this paper has its roots in the information systems development methodology.
A workflow to process 3D+time microscopy images of developing organisms and reconstruct their cell lineage

PubMed Central

Faure, Emmanuel; Savy, Thierry; Rizzi, Barbara; Melani, Camilo; Stašová, Olga; Fabrèges, Dimitri; Špir, Róbert; Hammons, Mark; Čúnderlík, Róbert; Recher, Gaëlle; Lombardot, Benoît; Duloquin, Louise; Colin, Ingrid; Kollár, Jozef; Desnoulez, Sophie; Affaticati, Pierre; Maury, Benoît; Boyreau, Adeline; Nief, Jean-Yves; Calvat, Pascal; Vernier, Philippe; Frain, Monique; Lutfalla, Georges; Kergosien, Yannick; Suret, Pierre; Remešíková, Mariana; Doursat, René; Sarti, Alessandro; Mikula, Karol; Peyriéras, Nadine; Bourgine, Paul

2016-01-01

The quantitative and systematic analysis of embryonic cell dynamics from in vivo 3D+time image data sets is a major challenge at the forefront of developmental biology. Despite recent breakthroughs in the microscopy imaging of living systems, producing an accurate cell lineage tree for any developing organism remains a difficult task. We present here the BioEmergences workflow integrating all reconstruction steps from image acquisition and processing to the interactive visualization of reconstructed data. Original mathematical methods and algorithms underlie image filtering, nucleus centre detection, nucleus and membrane segmentation, and cell tracking. They are demonstrated on zebrafish, ascidian and sea urchin embryos with stained nuclei and membranes. Subsequent validation and annotations are carried out using Mov-IT, a custom-made graphical interface. Compared with eight other software tools, our workflow achieved the best lineage score. Delivered in standalone or web service mode, BioEmergences and Mov-IT offer a unique set of tools for in silico experimental embryology. PMID:26912388
Automated metadata--final project report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David

This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

PubMed Central

Vernick, Kenneth D.

2017-01-01

Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
A pattern-based analysis of clinical computer-interpretable guideline modeling languages.

PubMed

Mulyar, Nataliya; van der Aalst, Wil M P; Peleg, Mor

2007-01-01

Languages used to specify computer-interpretable guidelines (CIGs) differ in their approaches to addressing particular modeling challenges. The main goals of this article are: (1) to examine the expressive power of CIG modeling languages, and (2) to define the differences, from the control-flow perspective, between process languages in workflow management systems and modeling languages used to design clinical guidelines. The pattern-based analysis was applied to guideline modeling languages Asbru, EON, GLIF, and PROforma. We focused on control-flow and left other perspectives out of consideration. We evaluated the selected CIG modeling languages and identified their degree of support of 43 control-flow patterns. We used a set of explicitly defined evaluation criteria to determine whether each pattern is supported directly, indirectly, or not at all. PROforma offers direct support for 22 of 43 patterns, Asbru 20, GLIF 17, and EON 11. All four directly support basic control-flow patterns, cancellation patterns, and some advance branching and synchronization patterns. None support multiple instances patterns. They offer varying levels of support for synchronizing merge patterns and state-based patterns. Some support a few scenarios not covered by the 43 control-flow patterns. CIG modeling languages are remarkably close to traditional workflow languages from the control-flow perspective, but cover many fewer workflow patterns. CIG languages offer some flexibility that supports modeling of complex decisions and provide ways for modeling some decisions not covered by workflow management systems. Workflow management systems may be suitable for clinical guideline applications.
Simulation environment and graphical visualization environment: a COPD use-case

PubMed Central

2014-01-01

Background Today, many different tools are developed to execute and visualize physiological models that represent the human physiology. Most of these tools run models written in very specific programming languages which in turn simplify the communication among models. Nevertheless, not all of these tools are able to run models written in different programming languages. In addition, interoperability between such models remains an unresolved issue. Results In this paper we present a simulation environment that allows, first, the execution of models developed in different programming languages and second the communication of parameters to interconnect these models. This simulation environment, developed within the Synergy-COPD project, aims at helping and supporting bio-researchers and medical students understand the internal mechanisms of the human body through the use of physiological models. This tool is composed of a graphical visualization environment, which is a web interface through which the user can interact with the models, and a simulation workflow management system composed of a control module and a data warehouse manager. The control module monitors the correct functioning of the whole system. The data warehouse manager is responsible for managing the stored information and supporting its flow among the different modules. This simulation environment has been validated with the integration of three models: two deterministic, i.e. based on linear and differential equations, and one probabilistic, i.e., based on probability theory. These models have been selected based on the disease under study in this project, i.e., chronic obstructive pulmonary disease. Conclusion It has been proved that the simulation environment presented here allows the user to research and study the internal mechanisms of the human physiology by the use of models via a graphical visualization environment. A new tool for bio-researchers is ready for deployment in various use cases scenarios. PMID:25471327
An approach for software-driven and standard-based support of cross-enterprise tumor boards.

PubMed

Mangesius, Patrick; Fischer, Bernd; Schabetsberger, Thomas

2015-01-01

For tumor boards, the networking of different medical disciplines' expertise continues to gain importance. However, interdisciplinary tumor boards spread across several institutions are rarely supported by information technology tools today. The aim of this paper is to point out an approach for a tumor board management system prototype. For analyzing the requirements, an incremental process was used. The requirements were surveyed using Informal Conversational Interview and documented with Use Case Diagrams defined by the Unified Modeling Language (UML). Analyses of current EHR standards were conducted to evaluate technical requirements. Functional and technical requirements of clinical conference applications were evaluated and documented. In several steps, workflows were derived and application mockups were created. Although there is a vast amount of common understanding concerning how clinical conferences should be conducted and how their workflows should be structured, these are hardly standardized, neither on a functional nor on a technical level. This results in drawbacks for participants and patients. Using modern EHR technologies based on profiles such as IHE Cross Enterprise document sharing (XDS), these deficits could be overcome.
Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

NASA Astrophysics Data System (ADS)

Harris, A. T.; Ramachandran, R.; Maskey, M.

2013-12-01

The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL Workbench, coupled with ESE for execution in the cloud, asynchronous workflows could be executed in batch mode on large data in the cloud. We envision that a scientist will initially develop a scientific workflow locally on a small set of data. Once tested, the scientist will deploy the workflow to the cloud for execution. Depending on the results, the scientist may share the workflow and results, allowing them to be stored in a community catalog and instantly loaded into the IDL Workbench of other scientists. Thereupon, scientists can clone and modify or execute the workflow with different input parameters. The Collaborative Workbench will provide a platform for collaboration in the cloud, helping Earth scientists solve big-data problems in the Earth and planetary sciences.
Evaluation of User Interface and Workflow Design of a Bedside Nursing Clinical Decision Support System

PubMed Central

Yuan, Michael Juntao; Finley, George Mike; Mills, Christy; Johnson, Ron Kim

2013-01-01

Background Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Objective Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Methods Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration (NASA) Task Load Index to self-evaluate the amount of cognitive and physical burden associated with using the device. Results A total of 83 heuristic violations were identified in the studies. The distribution of the heuristic violations and their average severity are reported. The nurse evaluators successfully completed all 30 sessions of the performance evaluations. All nurses were able to use the device after a single training session. On average, the nurses took 111 seconds (SD 30 seconds) to complete the simulated task. The NASA Task Load Index results indicated that the work overhead on the nurses was low. In fact, most of the burden measures were consistent with zero. The only potentially significant burden was temporal demand, which was consistent with the primary use case of the tool. Conclusions The evaluation has shown that our design was functional and met the requirements demanded by the nurses’ tight schedules and heavy workloads. The user interface embedded in the tool provided compelling utility to the nurse with minimal distraction. PMID:23612350
Asterism: an integrated, complete, and open-source approach for running seismologist continuous data-intensive analysis on heterogeneous systems

NASA Astrophysics Data System (ADS)

Ferreira da Silva, R.; Filgueira, R.; Deelman, E.; Atkinson, M.

2016-12-01

We present Asterism, an open source data-intensive framework, which combines the Pegasus and dispel4py workflow systems. Asterism aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. Asterism's key element is to leverage the strengths of each workflow system: dispel4py allows developing scientific applications locally and then automatically parallelize and scale them on a wide range of HPC infrastructures with no changes to the application's code; Pegasus orchestrates the distributed execution of applications while providing portability, automated data management, recovery, debugging, and monitoring, without users needing to worry about the particulars of the target execution systems. Asterism leverages the level of abstractions provided by each workflow system to describe hybrid workflows where no information about the underlying infrastructure is required beforehand. The feasibility of Asterism has been evaluated using the seismic ambient noise cross-correlation application, a common data-intensive analysis pattern used by many seismologists. The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The Asterism workflow is implemented as a Pegasus workflow composed of two tasks (Phase1 and Phase2), where each phase represents a dispel4py workflow. Pegasus tasks describe the in/output data at a logical level, the data dependency between tasks, and the e-Infrastructures and the execution engine to run each dispel4py workflow. We have instantiated the workflow using data from 1000 stations from the IRIS services, and run it across two heterogeneous resources described as Docker containers: MPI (Container2) and Storm (Container3) clusters (Figure 1). Each dispel4py workflow is mapped to a particular execution engine, and data transfers between resources are automatically handled by Pegasus. Asterism is freely available online at http://github.com/dispel4py/pegasus_dispel4py.
Use Cases for Combining Web Services with ArcPython Tools for Enabling Quality Control of Land Remote Sensing Data Products.

NASA Astrophysics Data System (ADS)

Krehbiel, C.; Maiersperger, T.; Friesz, A.; Harriman, L.; Quenzer, R.; Impecoven, K.

2016-12-01

Three major obstacles facing big Earth data users include data storage, management, and analysis. As the amount of satellite remote sensing data increases, so does the need for better data storage and management strategies to exploit the plethora of data now available. Standard GIS tools can help big Earth data users whom interact with and analyze increasingly large and diverse datasets. In this presentation we highlight how NASA's Land Processes Distributed Active Archive Center (LP DAAC) is tackling these big Earth data challenges. We provide a real life use case example to describe three tools and services provided by the LP DAAC to more efficiently exploit big Earth data in a GIS environment. First, we describe the Open-source Project for a Network Data Access Protocol (OPeNDAP), which calls to specific data, minimizing the amount of data that a user downloads and improves the efficiency of data downloading and processing. Next, we cover the LP DAAC's Application for Extracting and Exploring Analysis Ready Samples (AppEEARS), a web application interface for extracting and analyzing land remote sensing data. From there, we review an ArcPython toolbox that was developed to provide quality control services to land remote sensing data products. Locating and extracting specific subsets of larger big Earth datasets improves data storage and management efficiency for the end user, and quality control services provides a straightforward interpretation of big Earth data. These tools and services are beneficial to the GIS user community in terms of standardizing workflows and improving data storage, management, and analysis tactics.
Bringing the CMS distributed computing system into scalable operations

NASA Astrophysics Data System (ADS)

Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

2010-04-01

Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
Streamling the Change Management with Business Rules

NASA Technical Reports Server (NTRS)

Savela, Christopher

2015-01-01

Will discuss how their organization is trying to streamline workflows and the change management process with business rules. In looking for ways to make things more efficient and save money one way is to reduce the work the workflow task approvers have to do when reviewing affected items. Will share the technical details of the business rules, how to implement them, how to speed up the development process by using the API to demonstrate the rules in action.
A strategy for systemic toxicity assessment based on non-animal approaches: The Cosmetics Europe Long Range Science Strategy programme.

PubMed

Desprez, Bertrand; Dent, Matt; Keller, Detlef; Klaric, Martina; Ouédraogo, Gladys; Cubberley, Richard; Duplan, Hélène; Eilstein, Joan; Ellison, Corie; Grégoire, Sébastien; Hewitt, Nicola J; Jacques-Jamin, Carine; Lange, Daniela; Roe, Amy; Rothe, Helga; Blaauboer, Bas J; Schepky, Andreas; Mahony, Catherine

2018-08-01

When performing safety assessment of chemicals, the evaluation of their systemic toxicity based only on non-animal approaches is a challenging objective. The Safety Evaluation Ultimately Replacing Animal Test programme (SEURAT-1) addressed this question from 2011 to 2015 and showed that further research and development of adequate tools in toxicokinetic and toxicodynamic are required for performing non-animal safety assessments. It also showed how to implement tools like thresholds of toxicological concern (TTCs) and read-across in this context. This paper shows a tiered scientific workflow and how each tier addresses the four steps of the risk assessment paradigm. Cosmetics Europe established its Long Range Science Strategy (LRSS) programme, running from 2016 to 2020, based on the outcomes of SEURAT-1 to implement this workflow. Dedicated specific projects address each step of this workflow, which is introduced here. It tackles the question of evaluating the internal dose when systemic exposure happens. The applicability of the workflow will be shown through a series of case studies, which will be published separately. Even if the LRSS puts the emphasis on safety assessment of cosmetic relevant chemicals, it remains applicable to any type of chemical. Copyright © 2018. Published by Elsevier Ltd.
Safety and feasibility of STAT RAD: Improvement of a novel rapid tomotherapy-based radiation therapy workflow by failure mode and effects analysis.

PubMed

Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan

2015-01-01

The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow. Copyright © 2015 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Deploying and sharing U-Compare workflows as web services.

PubMed

Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia

2013-02-18

U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services

PubMed Central

2013-01-01

Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017

SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.

PubMed

Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C

2016-12-15

The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
COSMOS: Python library for massively parallel workflows

PubMed Central

Gafni, Erik; Luquette, Lovelace J.; Lancaster, Alex K.; Hawkins, Jared B.; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter J.

2014-01-01

Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24982428
COSMOS: Python library for massively parallel workflows.

PubMed

Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J

2014-10-15

Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Usability test of an internet-based informatics tool for diabetes care providers: the comprehensive diabetes management program.

PubMed

Fonda, Stephanie J; Paulsen, Christine A; Perkins, Joan; Kedziora, Richard J; Rodbard, David; Bursell, Sven-Erik

2008-02-01

Research suggests Internet-based care management tools are associated with improvements in care and patient outcomes. However, although such tools change workflow, rarely is their usability addressed and reported. This article presents a usability study of an Internet-based informatics application called the Comprehensive Diabetes Management Program (CDMP), developed by content experts and technologists. Our aim is to demonstrate a process for conducting a usability study of such a tool and to report results. We conducted the usability test with six diabetes care providers under controlled conditions. Each provider worked with the CDMP in a single session using a "think aloud" process. Providers performed standardized tasks with fictitious patient data, and we observed how they approached these tasks, documenting verbalizations and subjective ratings. The providers then completed a usability questionnaire and interviews. Overall, the scores on the usability questionnaire were neutral to favorable. For specific subdomains of the questionnaire, the providers' reported problems with the application's ease of use, performance, and support features, but were satisfied with its visual appeal and content. The results from the observational and interview data indicated areas for improvement, particularly in navigation and terminology. The usability study identified several issues for improvement, confirming the need for usability testing of Internet-based informatics applications, even those developed by experts. To our knowledge, there have been no other usability studies of an Internet-based informatics application with the functionality of the CDMP. Such studies can form the foundation for translation of Internet-based medical informatics tools into clinical practice.
ERM Ideas and Innovations

ERIC Educational Resources Information Center

Schmidt, Kari

2012-01-01

In this column, the author discusses how the management of e-books has introduced, at many libraries and in varying degrees, the challenges of maintaining effective technical services workflows. Four different e-book workflows are identified and explored, and the author takes a closer look at how particular variables for each are affected, such as…
Managing large-scale workflow execution from resource provisioning to provenance tracking: The CyberShake example

USGS Publications Warehouse

Deelman, E.; Callaghan, S.; Field, E.; Francoeur, H.; Graves, R.; Gupta, N.; Gupta, V.; Jordan, T.H.; Kesselman, C.; Maechling, P.; Mehringer, J.; Mehta, G.; Okaya, D.; Vahi, K.; Zhao, L.

2006-01-01

This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid. ?? 2006 IEEE.
Interventional-Cardiovascular MR: Role of the Interventional MR Technologist

PubMed Central

Mazal, Jonathan R; Rogers, Toby; Schenke, William H; Faranesh, Anthony Z; Hansen, Michael; O’Brien, Kendall; Ratnayaka, Kanishka; Lederman, Robert J

2016-01-01

Background Interventional-cardiovascular magnetic resonance (iCMR) is a promising clinical tool for adults and children who need a comprehensive hemodynamic catheterization of the heart. Magnetic resonance (MR) imaging-guided cardiac catheterization offers radiation-free examination with increased soft tissue contrast and unconstrained imaging planes for catheter guidance. The interventional MR technologist plays an important role in the care of patients undergoing such procedures. It is therefore helpful for technologists to under-stand the unique iCMR preprocedural preparation, procedural and imaging workflows, and management of emergencies. The authors report their team’s experience from the National Institutes of Health Clinical Center and a collaborating pediatric site. PMID:26721838
DOE Office of Scientific and Technical Information (OSTI.GOV)

Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin

The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment

PubMed Central

Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan

2016-01-01

Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058
Providing traceability for neuroimaging analyses.

PubMed

McClatchey, Richard; Branson, Andrew; Anjum, Ashiq; Bloodsworth, Peter; Habib, Irfan; Munir, Kamran; Shamdasani, Jetendr; Soomro, Kamran

2013-09-01

With the increasingly digital nature of biomedical data and as the complexity of analyses in medical research increases, the need for accurate information capture, traceability and accessibility has become crucial to medical researchers in the pursuance of their research goals. Grid- or Cloud-based technologies, often based on so-called Service Oriented Architectures (SOA), are increasingly being seen as viable solutions for managing distributed data and algorithms in the bio-medical domain. For neuroscientific analyses, especially those centred on complex image analysis, traceability of processes and datasets is essential but up to now this has not been captured in a manner that facilitates collaborative study. Few examples exist, of deployed medical systems based on Grids that provide the traceability of research data needed to facilitate complex analyses and none have been evaluated in practice. Over the past decade, we have been working with mammographers, paediatricians and neuroscientists in three generations of projects to provide the data management and provenance services now required for 21st century medical research. This paper outlines the finding of a requirements study and a resulting system architecture for the production of services to support neuroscientific studies of biomarkers for Alzheimer's disease. The paper proposes a software infrastructure and services that provide the foundation for such support. It introduces the use of the CRISTAL software to provide provenance management as one of a number of services delivered on a SOA, deployed to manage neuroimaging projects that have been studying biomarkers for Alzheimer's disease. In the neuGRID and N4U projects a Provenance Service has been delivered that captures and reconstructs the workflow information needed to facilitate researchers in conducting neuroimaging analyses. The software enables neuroscientists to track the evolution of workflows and datasets. It also tracks the outcomes of various analyses and provides provenance traceability throughout the lifecycle of their studies. As the Provenance Service has been designed to be generic it can be applied across the medical domain as a reusable tool for supporting medical researchers thus providing communities of researchers for the first time with the necessary tools to conduct widely distributed collaborative programmes of medical analysis. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
On the Scene: Developing a Nurse Care Coordinator Role at City of Hope.

PubMed

Johnson, Shirley A; Giesie, Pamela D; Ireland, Anne M; Rice, Robert David; Thomson, Brenda K

2016-01-01

We describe the development of an oncology solid tumor disease-focused care coordination model. Consistent with our strategic plan to provide patient- and family-centered care and to organize care around disease management teams, we developed the role of nurse care coordinator as an integral team member in our care delivery model. Managing a defined high-risk patient population across the care trajectory, these nurses provide stable points of contact and continuity for patients and families as they navigate the complex treatments and systems required to deliver cancer care. We describe role delineation and staffing models; role clarity between the role of the nurse care coordinator and the case manager; core curriculum development; the use of workflow management tools to support the touch points of the patient and members of the care team; and the incorporation of electronic medical records and data streams to inform the care delivery model. We identify measures that we will use to evaluate the success of our program.
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

PubMed

Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

PubMed Central

Cheng, Gong; Zhang, Guocai; Xu, Liang

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Project management: importance for diagnostic laboratories.

PubMed

Croxatto, A; Greub, G

2017-07-01

The need for diagnostic laboratories to improve both quality and productivity alongside personnel shortages incite laboratory managers to constantly optimize laboratory workflows, organization, and technology. These continuous modifications of the laboratories should be conducted using efficient project and change management approaches to maximize the opportunities for successful completion of the project. This review aims at presenting a general overview of project management with an emphasis on selected critical aspects. Conventional project management tools and models, such as HERMES, described in the literature, associated personal experience, and educational courses on management have been used to illustrate this review. This review presents general guidelines of project management and highlights their importance for microbiology diagnostic laboratories. As an example, some critical aspects of project management will be illustrated with a project of automation, as experienced at the laboratories of bacteriology and hygiene of the University Hospital of Lausanne. It is important to define clearly beforehand the objective of a project, its perimeter, its costs, and its time frame including precise duration estimates of each step. Then, a project management plan including explanations and descriptions on how to manage, execute, and control the project is necessary to continuously monitor the progression of a project to achieve its defined goals. Moreover, a thorough risk analysis with contingency and mitigation measures should be performed at each phase of a project to minimize the impact of project failures. The increasing complexities of modern laboratories mean clinical microbiologists must use several management tools including project and change management to improve the outcome of major projects and activities. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
The State of Cloud-Based Biospecimen and Biobank Data Management Tools.

PubMed

Paul, Shonali; Gade, Aditi; Mallipeddi, Sumani

2017-04-01

Biobanks are critical for collecting and managing high-quality biospecimens from donors with appropriate clinical annotation. The high-quality human biospecimens and associated data are required to better understand disease processes. Therefore, biobanks have become an important and essential resource for healthcare research and drug discovery. However, collecting and managing huge volumes of data (biospecimens and associated clinical data) necessitate that biobanks use appropriate data management solutions that can keep pace with the ever-changing requirements of research. To automate biobank data management, biobanks have been investing in traditional Laboratory Information Management Systems (LIMS). However, there are a myriad of challenges faced by biobanks in acquiring traditional LIMS. Traditional LIMS are cost-intensive and often lack the flexibility to accommodate changes in data sources and workflows. Cloud technology is emerging as an alternative that provides the opportunity to small and medium-sized biobanks to automate their operations in a cost-effective manner, even without IT personnel. Cloud-based solutions offer the advantage of heightened security, rapid scalability, dynamic allocation of services, and can facilitate collaboration between different research groups by using a shared environment on a "pay-as-you-go" basis. The benefits offered by cloud technology have resulted in the development of cloud-based data management solutions as an alternative to traditional on-premise software. After evaluating the advantages offered by cloud technology, several biobanks have started adopting cloud-based tools. Cloud-based tools provide biobanks with easy access to biospecimen data for real-time sharing with clinicians. Another major benefit realized by biobanks by implementing cloud-based applications is unlimited data storage on the cloud and automatic backups for protecting any data loss in the face of natural calamities.
Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

NASA Astrophysics Data System (ADS)

Nguyen, P. T.; Chapman, D. R.; Halem, M.

2012-12-01

New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in the Nino 4 region, as well as a 1.9 Kelvin decadal Arctic warming in the 4u and 12u spectral regions. Additionally, we will present the frequency of extreme global warming events by the use of a normalized maximum BT in a grid cell relative to its local standard deviation. A low-latency Hadoop scheduling environment maintains data integrity and fault tolerance in a MapReduce data intensive Cloud environment while improving the "time to solution" metric by 35% when compared to a more traditional parallel processing system for the same dataset. Our next step will be to improve the usability of our Hadoop task scheduling system, to enable rapid prototyping of data intensive experiments by means of processing "kernels". We will report on the performance and experience of implementing these experiments on the NEX testbed, and propose the use of a graphical directed acyclic graph (DAG) interface to help us develop on-demand scientific experiments. Our workflow system works within Hadoop infrastructure as a replacement for the FIFO or FairScheduler, thus the use of Apache "Pig" latin or other Apache tools may also be worth investigating on the NEX system to improve the usability of our workflow scheduling infrastructure for rapid experimentation.
Free web-based modelling platform for managed aquifer recharge (MAR) applications

NASA Astrophysics Data System (ADS)

Stefan, Catalin; Junghanns, Ralf; Glaß, Jana; Sallwey, Jana; Fatkhutdinov, Aybulat; Fichtner, Thomas; Barquero, Felix; Moreno, Miguel; Bonilla, José; Kwoyiga, Lydia

2017-04-01

Managed aquifer recharge represents a valuable instrument for sustainable water resources management. The concept implies purposeful infiltration of surface water into underground for later recovery or environmental benefits. Over decades, MAR schemes were successfully installed worldwide for a variety of reasons: to maximize the natural storage capacity of aquifers, physical aquifer management, water quality management, and ecological benefits. The INOWAS-DSS platform provides a collection of free web-based tools for planning, management and optimization of main components of MAR schemes. The tools are grouped into 13 specific applications that cover most relevant challenges encountered at MAR sites, both from quantitative and qualitative perspectives. The applications include among others the optimization of MAR site location, the assessment of saltwater intrusion, the restoration of groundwater levels in overexploited aquifers, the maximization of natural storage capacity of aquifers, the improvement of water quality, the design and operational optimization of MAR schemes, clogging development and risk assessment. The platform contains a collection of about 35 web-based tools of various degrees of complexity, which are either included in application specific workflows or used as standalone modelling instruments. Among them are simple tools derived from data mining and empirical equations, analytical groundwater related equations, as well as complex numerical flow and transport models (MODFLOW, MT3DMS and SEAWAT). Up to now, the simulation core of the INOWAS-DSS, which is based on the finite differences groundwater flow model MODFLOW, is implemented and runs on the web. A scenario analyser helps to easily set up and evaluate new management options as well as future development such as land use and climate change and compare them to previous scenarios. Additionally simple tools such as analytical equations to assess saltwater intrusion are already running online. Besides the simulation tools, a web-based data base is under development where geospatial and time series data can be stored, managed, and processed. Furthermore, a web-based information system containing user guides for the various developed tools and applications as well as basic information on MAR and related topics is published and will be regularly expanded as new tools are getting implemented. The INOWAS-DSS including its simulation tools, data base and information system provides an extensive framework to manage, plan and optimize MAR facilities. As the INOWAS-DSS is an open-source software accessible via the internet using standard web browsers, it offers new ways for data sharing and collaboration among various partners and decision makers.
An integrated, open-source set of tools for urban vulnerability monitoring from Earth observation data

NASA Astrophysics Data System (ADS)

De Vecchi, Daniele; Harb, Mostapha; Dell'Acqua, Fabio; Aurelio Galeazzo, Daniel

2015-04-01

Aim: The paper introduces an integrated set of open-source tools designed to process medium and high-resolution imagery with the aim to extract vulnerability indicators [1]. Problem: In the context of risk monitoring [2], a series of vulnerability proxies can be defined, such as the extension of a built-up area or buildings regularity [3]. Different open-source C and Python libraries are already available for image processing and geospatial information (e.g. OrfeoToolbox, OpenCV and GDAL). They include basic processing tools but not vulnerability-oriented workflows. Therefore, it is of significant importance to provide end-users with a set of tools capable to return information at a higher level. Solution: The proposed set of python algorithms is a combination of low-level image processing and geospatial information handling tools along with high-level workflows. In particular, two main products are released under the GPL license: source code, developers-oriented, and a QGIS plugin. These tools were produced within the SENSUM project framework (ended December 2014) where the main focus was on earthquake and landslide risk. Further development and maintenance is guaranteed by the decision to include them in the platform designed within the FP 7 RASOR project . Conclusion: With the lack of a unified software suite for vulnerability indicators extraction, the proposed solution can provide inputs for already available models like the Global Earthquake Model. The inclusion of the proposed set of algorithms within the RASOR platforms can guarantee support and enlarge the community of end-users. Keywords: Vulnerability monitoring, remote sensing, optical imagery, open-source software tools References [1] M. Harb, D. De Vecchi, F. Dell'Acqua, "Remote sensing-based vulnerability proxies in the EU FP7 project SENSUM", Symposium on earthquake and landslide risk in Central Asia and Caucasus: exploiting remote sensing and geo-spatial information management, 29-30th January 2014, Bishkek, Kyrgyz Republic. [2] UNISDR, "Living with Risk", Geneva, Switzerland, 2004. [3] P. Bisch, E. Carvalho, H. Degree, P. Fajfar, M. Fardis, P. Franchin, M. Kreslin, A. Pecker, "Eurocode 8: Seismic Design of Buildings", Lisbon, 2011. (SENSUM: www.sensum-project.eu, grant number: 312972 ) (RASOR: www.rasor-project.eu, grant number: 606888 )
Community-driven computational biology with Debian Linux.

PubMed

Möller, Steffen; Krabbenhöft, Hajo Nils; Tille, Andreas; Paleino, David; Williams, Alan; Wolstencroft, Katy; Goble, Carole; Holland, Richard; Belhachemi, Dominique; Plessy, Charles

2010-12-21

The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers.
Dynamic reusable workflows for ocean science

USGS Publications Warehouse

Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

2016-01-01

Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.

Digital transformation in home care. A case study.

PubMed

Bennis, Sandy; Costanzo, Diane; Flynn, Ann Marie; Reidy, Agatha; Tronni, Catherine

2007-01-01

Simply implementing software and technology does not assure that an organization's targeted clinical and financial goals will be realized. No longer is it possible to roll out a new system--by solely providing end user training and overlaying it on top of already inefficient workflows and outdated roles--and know with certainty that targets will be met. At Virtua Health's Home Care, based in south New Jersey, implementation of their electronic system initially followed this more traditional approach. Unable to completely attain their earlier identified return on investment, they enlisted the help of a new role within their health system, that of the nurse informaticist. Knowledgeable in complex clinical processes and not bound by the technology at hand, the informaticist analyzed physical workflow, digital workflow, roles and physical layout. Leveraging specific tools such as change acceleration, workouts and LEAN, the informaticist was able to redesign workflow and support new levels of functionality. This article provides a view from the "finish line", recounting how this role worked with home care to assimilate information delivery into more efficient processes and align resources to support the new workflow, ultimately achieving real tangible returns.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

NASA Astrophysics Data System (ADS)

Peterson, F. S.; Barbieri, L.; Wyngaard, J.

2015-12-01

Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
Implementing CORAL: An Electronic Resource Management System

ERIC Educational Resources Information Center

Whitfield, Sharon

2011-01-01

A 2010 electronic resource management survey conducted by Maria Collins of North Carolina State University and Jill E. Grogg of University of Alabama Libraries found that the top six electronic resources management priorities included workflow management, communications management, license management, statistics management, administrative…
ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry

PubMed Central

2011-01-01

Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234
A framework for service enterprise workflow simulation with multi-agents cooperation

NASA Astrophysics Data System (ADS)

Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun

2013-11-01

Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.
Improving data collection, documentation, and workflow in a dementia screening study.

PubMed

Read, Kevin B; LaPolla, Fred Willie Zametkin; Tolea, Magdalena I; Galvin, James E; Surkis, Alisa

2017-04-01

A clinical study team performing three multicultural dementia screening studies identified the need to improve data management practices and facilitate data sharing. A collaboration was initiated with librarians as part of the National Library of Medicine (NLM) informationist supplement program. The librarians identified areas for improvement in the studies' data collection, entry, and processing workflows. The librarians' role in this project was to meet needs expressed by the study team around improving data collection and processing workflows to increase study efficiency and ensure data quality. The librarians addressed the data collection, entry, and processing weaknesses through standardizing and renaming variables, creating an electronic data capture system using REDCap, and developing well-documented, reproducible data processing workflows. NLM informationist supplements provide librarians with valuable experience in collaborating with study teams to address their data needs. For this project, the librarians gained skills in project management, REDCap, and understanding of the challenges and specifics of a clinical research study. However, the time and effort required to provide targeted and intensive support for one study team was not scalable to the library's broader user community.
Scalable and cost-effective NGS genotyping in the cloud.

PubMed

Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P

2015-10-15

While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)

NASA Astrophysics Data System (ADS)

Langenhorst, A. R.; Balaji, V.; Yakovlev, A.

2010-12-01

A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability to incorporate user-level functions. FRE post-processing can take us all the way to the preparing of graphical output for a scientific audience, and publication of data on a public portal. ● Recent FRE development includes incorporating a distributed workflow to support remote computing.
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

PubMed

Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

2016-10-06

With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
Updates in metabolomics tools and resources: 2014-2015.

PubMed

Misra, Biswapriya B; van der Hooft, Justin J J

2016-01-01

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources--in the form of tools, software, and databases--is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Ramachandran, R.; Lynnes, C.

2009-05-01

A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Reducing Missed Laboratory Results: Defining Temporal Responsibility, Generating User Interfaces for Test Process Tracking, and Retrospective Analyses to Identify Problems

PubMed Central

Tarkan, Sureyya; Plaisant, Catherine; Shneiderman, Ben; Hettinger, A. Zachary

2011-01-01

Researchers have conducted numerous case studies reporting the details on how laboratory test results of patients were missed by the ordering medical providers. Given the importance of timely test results in an outpatient setting, there is limited discussion of electronic versions of test result management tools to help clinicians and medical staff with this complex process. This paper presents three ideas to reduce missed results with a system that facilitates tracking laboratory tests from order to completion as well as during follow-up: (1) define a workflow management model that clarifies responsible agents and associated time frame, (2) generate a user interface for tracking that could eventually be integrated into current electronic health record (EHR) systems, (3) help identify common problems in past orders through retrospective analyses. PMID:22195201
Productivity, part 1: getting things done, using e-mail, scanners, reference managers, note-taking applications, and text expanders.

PubMed

Lackey, Amanda E; Moshiri, Mariam; Pandey, Tarun; Lall, Chandana; Lalwani, Neeraj; Bhargava, Puneet

2014-05-01

In an era of declining reimbursements and tightening of the job market, today's radiologists are forced to "make do with less." With the rollout of the Patient Protection and Affordable Care Act, commonly called "Obamacare," radiologists will be expected not only to interpret studies but to also take on many additional roles, adding a new layer of complexity to already demanding daily duties. These changes make it more important than ever to develop a personal workflow management system incorporating some of the most potent productivity tools. In this article, the authors discuss current productivity techniques and related software with the most potential to help radiologists keep up with the ever increasing demands on their time at the work place and help us lead more balanced lives. Published by Elsevier Inc.
Time management in acute vertebrobasilar occlusion.

PubMed

Kamper, Lars; Rybacki, Konrad; Mansour, Michael; Winkler, Sven B; Kempkes, Udo; Haage, Patrick

2009-03-01

Acute vertebrobasilar occlusion (VBO) is associated with a high risk of stroke and death. Although local thrombolysis may achieve recanalization and improve outcome, mortality is still between 35% and 75%. However, without recanalization the chance of a good outcome is extremely poor, with mortality rates of 80-90%. Early treatment is a fundamental factor, but detailed studies of the exact time management of the diagnostic and interventional workflow are still lacking. Data on 18 patients were retrospectively evaluated. Time periods between symptom onset, admission to hospital, time of diagnosis, and beginning of intervention were correlated with postinterventional neurological status. The Glasgow Coma Scale and National Institute of Health Stroke Scale (NIHSS) were used to examine patients before and after local thrombolysis. Additionally, multivariate statistics were applied to reveal similarities between patients with neurological improvement. Primary recanalization was achieved in 77% of patients. The overall mortality was 55%. Major complications were intracranial hemorrhage and peripheral embolism. The time period from symptom onset to intervention showed a strong correlation with the postinterventional NIHSS as well as the patient's age, with the best results in a 4-h interval. Multivariate statistics revealed similarities among the patients. Evaluation of time management in acute VBO by multivariate statistics is a helpful tool for definition of similarities in this patient group. Similarly to the door-to-balloon time for acute coronary interventions, the chances for a good outcome depend on a short time interval between symptom onset and intervention. While the only manipulable time period starts with hospital admission, our results emphasize the necessity of efficient intrahospital workflow.
Geospatial Data Processing for 3d City Model Generation, Management and Visualization

NASA Astrophysics Data System (ADS)

Toschi, I.; Nocerino, E.; Remondino, F.; Revolti, A.; Soria, G.; Piffer, S.

2017-05-01

Recent developments of 3D technologies and tools have increased availability and relevance of 3D data (from 3D points to complete city models) in the geospatial and geo-information domains. Nevertheless, the potential of 3D data is still underexploited and mainly confined to visualization purposes. Therefore, the major challenge today is to create automatic procedures that make best use of available technologies and data for the benefits and needs of public administrations (PA) and national mapping agencies (NMA) involved in "smart city" applications. The paper aims to demonstrate a step forward in this process by presenting the results of the SENECA project (Smart and SustaiNablE City from Above - http://seneca.fbk.eu). State-of-the-art processing solutions are investigated in order to (i) efficiently exploit the photogrammetric workflow (aerial triangulation and dense image matching), (ii) derive topologically and geometrically accurate 3D geo-objects (i.e. building models) at various levels of detail and (iii) link geometries with non-spatial information within a 3D geo-database management system accessible via web-based client. The developed methodology is tested on two case studies, i.e. the cities of Trento (Italy) and Graz (Austria). Both spatial (i.e. nadir and oblique imagery) and non-spatial (i.e. cadastral information and building energy consumptions) data are collected and used as input for the project workflow, starting from 3D geometry capture and modelling in urban scenarios to geometry enrichment and management within a dedicated webGIS platform.
Potential of knowledge discovery using workflows implemented in the C3Grid

NASA Astrophysics Data System (ADS)

Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny

2013-04-01

With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity alteration of surface cyclones. A specific feature of C3Grid is the flexible Workflow Scheduling Service (WSS) which also allows for automated nightly analysis runs of CT, Stormtrack, etc. with different input parameter sets. The statistical results of these workflows can be accumulated afterwards by a scheduled final analysis step, thereby providing a tool for data intensive analytics for the massive amounts of climate model data accessible through C3Grid. First tests with these automated analysis workflows show promising results to speed up the investigation of high volume modeling data. This example is relevant to the thorough analysis of future changes in storminess in Europe and is just one example of the potential of knowledge discovery using automated workflows implemented in the C3Grid architecture.
An open source software for analysis of dynamic contrast enhanced magnetic resonance images: UMMPerfusion revisited.

PubMed

Zöllner, Frank G; Daab, Markus; Sourbron, Steven P; Schad, Lothar R; Schoenberg, Stefan O; Weisser, Gerald

2016-01-14

Perfusion imaging has become an important image based tool to derive the physiological information in various applications, like tumor diagnostics and therapy, stroke, (cardio-) vascular diseases, or functional assessment of organs. However, even after 20 years of intense research in this field, perfusion imaging still remains a research tool without a broad clinical usage. One problem is the lack of standardization in technical aspects which have to be considered for successful quantitative evaluation; the second problem is a lack of tools that allow a direct integration into the diagnostic workflow in radiology. Five compartment models, namely, a one compartment model (1CP), a two compartment exchange (2CXM), a two compartment uptake model (2CUM), a two compartment filtration model (2FM) and eventually the extended Toft's model (ETM) were implemented as plugin for the DICOM workstation OsiriX. Moreover, the plugin has a clean graphical user interface and provides means for quality management during the perfusion data analysis. Based on reference test data, the implementation was validated against a reference implementation. No differences were found in the calculated parameters. We developed open source software to analyse DCE-MRI perfusion data. The software is designed as plugin for the DICOM Workstation OsiriX. It features a clean GUI and provides a simple workflow for data analysis while it could also be seen as a toolbox providing an implementation of several recent compartment models to be applied in research tasks. Integration into the infrastructure of a radiology department is given via OsiriX. Results can be saved automatically and reports generated automatically during data analysis ensure certain quality control.
VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

PubMed

Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

2018-01-01

Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Translating research into practice through user-centered design: An application for osteoarthritis healthcare planning.

PubMed

Carr, Eloise Cj; Babione, Julie N; Marshall, Deborah

2017-08-01

To identify the needs and requirements of the end users, to inform the development of a user-interface to translate an existing evidence-based decision support tool into a practical and usable interface for health service planning for osteoarthritis (OA) care. We used a user-centered design (UCD) approach that emphasized the role of the end-users and is well-suited to knowledge translation (KT). The first phase used a needs assessment focus group (n=8) and interviews (n=5) with target users (health care planners) within a provincial health care organization. The second phase used a participatory design approach, with two small group sessions (n=6) to explore workflow, thought processes, and needs of intended users. The needs assessment identified five design recommendations: ensuring the user-interface supports the target user group, allowing for user-directed data explorations, input parameter flexibility, clear presentation, and provision of relevant definitions. The second phase identified workflow insights from a proposed scenario. Graphs, the need for a visual overview of the data, and interactivity were key considerations to aid in meaningful use of the model and knowledge translation. A UCD approach is well suited to identify health care planners' requirements when using a decision support tool to improve health service planning and management of OA. We believe this is one of the first applications to be used in planning for health service delivery. We identified specific design recommendations that will increase user acceptability and uptake of the user-interface and underlying decision support tool in practice. Our approach demonstrated how UCD can be used to enable knowledge translation. Copyright © 2017 Elsevier B.V. All rights reserved.
BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication.

PubMed

Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin

2017-11-10

The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.