scientific workflow system: Topics by Science.gov

Sample records for scientific workflow system

Agile parallel bioinformatics workflow management using Pwrake.

PubMed

Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

2011-09-08

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake

PubMed Central

2011-01-01

Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE PAGES

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...

2016-07-21

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deelman, Ewa; Carothers, Christopher; Mandal, Anirban

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

DOE PAGES

Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...

2015-07-14

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
A characterization of workflow management systems for extreme-scale applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications

DOE PAGES

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

2017-02-16

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

NASA Astrophysics Data System (ADS)

Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.

Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
A Drupal-Based Collaborative Framework for Science Workflows

NASA Astrophysics Data System (ADS)

Pinheiro da Silva, P.; Gandara, A.

2010-12-01

Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
The future of scientific workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deelman, Ewa; Peterka, Tom; Altintas, Ilkay

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

2011-07-04

A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

DOE PAGES

Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...

2015-01-01

This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less
SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

NASA Astrophysics Data System (ADS)

Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

2014-05-01

Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.
A virtual data language and system for scientific workflow management in data grid environments

NASA Astrophysics Data System (ADS)

Zhao, Yong

With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.
Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

DOE PAGES

Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...

2016-10-06

The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin

The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

PubMed

Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

2016-06-01

This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.
Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Chase Qishi; Zhu, Michelle Mengxia

The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
Chang'E-3 data pre-processing system based on scientific workflow

NASA Astrophysics Data System (ADS)

tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

2016-04-01

The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.

Workflow based framework for life science informatics.

PubMed

Tiwari, Abhishek; Sekhar, Arvind K T

2007-10-01

Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, Wucherl; Koo, Michelle; Cao, Yu

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
Web-Accessible Scientific Workflow System for Performance Monitoring

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roelof Versteeg; Roelof Versteeg; Trevor Rowe

2006-03-01

We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

NASA Astrophysics Data System (ADS)

Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

2007-12-01

Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
Science Gateways, Scientific Workflows and Open Community Software

NASA Astrophysics Data System (ADS)

Pierce, M. E.; Marru, S.

2014-12-01

Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing Apache Airavata as a hosted service to provide these features.
Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

NASA Astrophysics Data System (ADS)

Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

2015-12-01

The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. We have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC); the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In this paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ludascher, Bertram; Altintas, Ilkay

Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

NASA Astrophysics Data System (ADS)

Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

2017-04-01

Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).
Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

DOE PAGES

Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; ...

2015-12-23

The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally,more » it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.« less
Conceptual-level workflow modeling of scientific experiments using NMR as a case study

PubMed Central

Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R

2007-01-01

Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment. PMID:17263870
Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

PubMed

Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R

2007-01-30

Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2014-12-01

The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
Scientific workflows as productivity tools for drug discovery.

PubMed

Shon, John; Ohkawa, Hitomi; Hammer, Juergen

2008-05-01

Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.
It's All About the Data: Workflow Systems and Weather

NASA Astrophysics Data System (ADS)

Plale, B.

2009-05-01

Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ludascher, Bertram; Altintas, Ilkay

Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.
Flexible workflow sharing and execution services for e-scientists

NASA Astrophysics Data System (ADS)

Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

2013-04-01

The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

PubMed

Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

2016-01-01

In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
Widening the adoption of workflows to include human and human-machine scientific processes

NASA Astrophysics Data System (ADS)

Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

2010-12-01

Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.
Integrate Data into Scientific Workflows for Terrestrial Biosphere Model Evaluation through Brokers

NASA Astrophysics Data System (ADS)

Wei, Y.; Cook, R. B.; Du, F.; Dasgupta, A.; Poco, J.; Huntzinger, D. N.; Schwalm, C. R.; Boldrini, E.; Santoro, M.; Pearlman, J.; Pearlman, F.; Nativi, S.; Khalsa, S.

2013-12-01

Terrestrial biosphere models (TBMs) have become integral tools for extrapolating local observations and process-level understanding of land-atmosphere carbon exchange to larger regions. Model-model and model-observation intercomparisons are critical to understand the uncertainties within model outputs, to improve model skill, and to improve our understanding of land-atmosphere carbon exchange. The DataONE Exploration, Visualization, and Analysis (EVA) working group is evaluating TBMs using scientific workflows in UV-CDAT/VisTrails. This workflow-based approach promotes collaboration and improved tracking of evaluation provenance. But challenges still remain. The multi-scale and multi-discipline nature of TBMs makes it necessary to include diverse and distributed data resources in model evaluation. These include, among others, remote sensing data from NASA, flux tower observations from various organizations including DOE, and inventory data from US Forest Service. A key challenge is to make heterogeneous data from different organizations and disciplines discoverable and readily integrated for use in scientific workflows. This presentation introduces the brokering approach taken by the DataONE EVA to fill the gap between TBMs' evaluation scientific workflows and cross-organization and cross-discipline data resources. The DataONE EVA started the development of an Integrated Model Intercomparison Framework (IMIF) that leverages standards-based discovery and access brokers to dynamically discover, access, and transform (e.g. subset and resampling) diverse data products from DataONE, Earth System Grid (ESG), and other data repositories into a format that can be readily used by scientific workflows in UV-CDAT/VisTrails. The discovery and access brokers serve as an independent middleware that bridge existing data repositories and TBMs evaluation scientific workflows but introduce little overhead to either component. In the initial work, an OpenSearch-based discovery broker is leveraged to provide a consistent mechanism for data discovery. Standards-based data services, including Open Geospatial Consortium (OGC) Web Coverage Service (WCS) and THREDDS are leveraged to provide on-demand data access and transformations through the data access broker. To ease the adoption of broker services, a package of broker client VisTrails modules have been developed to be easily plugged into scientific workflows. The initial IMIF has been successfully tested in selected model evaluation scenarios involved in the NASA-funded Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP).
The MPO system for automatic workflow documentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abla, G.; Coviello, E. N.; Flanagan, S. M.

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less

The MPO system for automatic workflow documentation

DOE PAGES

Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

2016-04-18

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
Auspice: Automatic Service Planning in Cloud/Grid Environments

NASA Astrophysics Data System (ADS)

Chiu, David; Agrawal, Gagan

Recent scientific advances have fostered a mounting number of services and data sets available for utilization. These resources, though scattered across disparate locations, are often loosely coupled both semantically and operationally. This loosely coupled relationship implies the possibility of linking together operations and data sets to answer queries. This task, generally known as automatic service composition, therefore abstracts the process of complex scientific workflow planning from the user. We have been exploring a metadata-driven approach toward automatic service workflow composition, among other enabling mechanisms, in our system, Auspice: Automatic Service Planning in Cloud/Grid Environments. In this paper, we present a complete overview of our system's unique features and outlooks for future deployment as the Cloud computing paradigm becomes increasingly eminent in enabling scientific computing.
Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garzoglio, Gabriele

The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.
Workflows for microarray data processing in the Kepler environment.

PubMed

Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

2012-05-17

Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
Scientific Workflows + Provenance = Better (Meta-)Data Management

NASA Astrophysics Data System (ADS)

Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

2013-12-01

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Garzoglio, Gabriele

The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center are working on a multi-year Collaborative Research and Development Agreement.With the knowledge developed in the first year on how to provision and manage a federation of virtual machines through Cloud management systems. In this second year, we expanded the work on provisioning and federation, increasing both scale and diversity of solutions, and we started to build on-demand services on the established fabric, introducing the paradigm of Platform as a Service to assist with the execution of scientific workflows. We have enabled scientific workflows ofmore » stakeholders to run on multiple cloud resources at the scale of 1,000 concurrent machines. The demonstrations have been in the areas of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federation of Cloud Resources, and (c) On-demand Services for ScientificWorkflows.« less
From the desktop to the grid: scalable bioinformatics via workflow conversion.

PubMed

de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver

2016-03-12

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
A Scientific Software Product Line for the Bioinformatics domain.

PubMed

Costa, Gabriella Castro B; Braga, Regina; David, José Maria N; Campos, Fernanda

2015-08-01

Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation presents case studies in bioinformatics, which were conducted in two renowned research institutions in Brazil. Copyright © 2015 Elsevier Inc. All rights reserved.
Provenance Storage, Querying, and Visualization in PBase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo

2015-01-01

We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.
Addressing informatics challenges in Translational Research with workflow technology.

PubMed

Beaulah, Simon A; Correll, Mick A; Munro, Robin E J; Sheldon, Jonathan G

2008-09-01

Interest in Translational Research has been growing rapidly in recent years. In this collision of different data, technologies and cultures lie tremendous opportunities for the advancement of science and business for organisations that are able to integrate, analyse and deliver this information effectively to users. Workflow-based integration and analysis systems are becoming recognised as a fast and flexible way to build applications that are tailored to scientific areas, yet are built on a common platform. Workflow systems are allowing organisations to meet the key informatics challenges in Translational Research and improve disease understanding and patient care.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

NASA Astrophysics Data System (ADS)

Peterson, F. S.; Barbieri, L.; Wyngaard, J.

2015-12-01

Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
Scientific Workflows and the Sensor Web for Virtual Environmental Observatories

NASA Astrophysics Data System (ADS)

Simonis, I.; Vahed, A.

2008-12-01

Virtual observatories mature from their original domain and become common practice for earth observation research and policy building. The term Virtual Observatory originally came from the astronomical research community. Here, virtual observatories provide universal access to the available astronomical data archives of space and ground-based observatories. Further on, as those virtual observatories aim at integrating heterogeneous ressources provided by a number of participating organizations, the virtual observatory acts as a coordinating entity that strives for common data analysis techniques and tools based on common standards. The Sensor Web is on its way to become one of the major virtual observatories outside of the astronomical research community. Like the original observatory that consists of a number of telescopes, each observing a specific part of the wave spectrum and with a collection of astronomical instruments, the Sensor Web provides a multi-eyes perspective on the current, past, as well as future situation of our planet and its surrounding spheres. The current view of the Sensor Web is that of a single worldwide collaborative, coherent, consistent and consolidated sensor data collection, fusion and distribution system. The Sensor Web can perform as an extensive monitoring and sensing system that provides timely, comprehensive, continuous and multi-mode observations. This technology is key to monitoring and understanding our natural environment, including key areas such as climate change, biodiversity, or natural disasters on local, regional, and global scales. The Sensor Web concept has been well established with ongoing global research and deployment of Sensor Web middleware and standards and represents the foundation layer of systems like the Global Earth Observation System of Systems (GEOSS). The Sensor Web consists of a huge variety of physical and virtual sensors as well as observational data, made available on the Internet at standardized interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.
Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)

NASA Astrophysics Data System (ADS)

Langenhorst, A. R.; Balaji, V.; Yakovlev, A.

2010-12-01

A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability to incorporate user-level functions. FRE post-processing can take us all the way to the preparing of graphical output for a scientific audience, and publication of data on a public portal. ● Recent FRE development includes incorporating a distributed workflow to support remote computing.
Metaworkflows and Workflow Interoperability for Heliophysics

NASA Astrophysics Data System (ADS)

Pierantoni, Gabriele; Carley, Eoin P.

2014-06-01

Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They- implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WSPGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
A Scientific Workflow Platform for Generic and Scalable Object Recognition on Medical Images

NASA Astrophysics Data System (ADS)

Möller, Manuel; Tuot, Christopher; Sintek, Michael

In the research project THESEUS MEDICO we aim at a system combining medical image information with semantic background knowledge from ontologies to give clinicians fully cross-modal access to biomedical image repositories. Therefore joint efforts have to be made in more than one dimension: Object detection processes have to be specified in which an abstraction is performed starting from low-level image features across landmark detection utilizing abstract domain knowledge up to high-level object recognition. We propose a system based on a client-server extension of the scientific workflow platform Kepler that assists the collaboration of medical experts and computer scientists during development and parameter learning.
Provenance-Powered Automatic Workflow Generation and Composition

NASA Astrophysics Data System (ADS)

Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

2015-12-01

In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
DIaaS: Data-Intensive workflows as a service - Enabling easy composition and deployment of data-intensive workflows on Virtual Research Environments

NASA Astrophysics Data System (ADS)

Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.

2016-12-01

We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
Integration of Earth System Models and Workflow Management under iRODS for the Northeast Regional Earth System Modeling Project

NASA Astrophysics Data System (ADS)

Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.

2012-12-01

The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.
The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)

NASA Astrophysics Data System (ADS)

Stephan, E.

2010-12-01

The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual success is tied together, forming a symbiotic relationship where research and development advances in one effort can provide tremendous benefits to the other. For example, automating provenance extraction within scientific applications is still a relatively new concept; the workflow engine provides the framework to capture application specific operations, inputs, and resulting data. It provides a description of the process history and data flow by wrapping workflow components around the applications and data sources. On the other hand, a lack of cooperation between workflows and provenance can inhibit usefulness of both to science. Blindly tracking the execution history without having a true understanding of what kinds of questions end users may have makes the provenance indecipherable to the target users. Over the past nine years PNNL has been actively involved in provenance research in support of computational chemistry, molecular dynamics, biology, hydrology, and climate. PNNL has also been actively involved in efforts by the international community to develop open standards for provenance and the development of architectures to support provenance capture, storage, and querying. This presentation will provide real world use cases of how provenance and workflow can be leveraged and implemented to meet different needs and the challenges that lie ahead.
The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change

NASA Astrophysics Data System (ADS)

Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.

2015-12-01

The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.

Towards a Unified Architecture for Data-Intensive Seismology in VERCE

NASA Astrophysics Data System (ADS)

Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.

2013-12-01

Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
MyGeoHub: A Collaborative Geospatial Research and Education Platform

NASA Astrophysics Data System (ADS)

Kalyanam, R.; Zhao, L.; Biehl, L. L.; Song, C. X.; Merwade, V.; Villoria, N.

2017-12-01

Scientific research is increasingly collaborative and globally distributed; research groups now rely on web-based scientific tools and data management systems to simplify their day-to-day collaborative workflows. However, such tools often lack seamless interfaces, requiring researchers to contend with manual data transfers, annotation and sharing. MyGeoHub is a web platform that supports out-of-the-box, seamless workflows involving data ingestion, metadata extraction, analysis, sharing and publication. MyGeoHub is built on the HUBzero cyberinfrastructure platform and adds general-purpose software building blocks (GABBs), for geospatial data management, visualization and analysis. A data management building block iData, processes geospatial files, extracting metadata for keyword and map-based search while enabling quick previews. iData is pervasive, allowing access through a web interface, scientific tools on MyGeoHub or even mobile field devices via a data service API. GABBs includes a Python map library as well as map widgets that in a few lines of code, generate complete geospatial visualization web interfaces for scientific tools. GABBs also includes powerful tools that can be used with no programming effort. The GeoBuilder tool provides an intuitive wizard for importing multi-variable, geo-located time series data (typical of sensor readings, GPS trackers) to build visualizations supporting data filtering and plotting. MyGeoHub has been used in tutorials at scientific conferences and educational activities for K-12 students. MyGeoHub is also constantly evolving; the recent addition of Jupyter and R Shiny notebook environments enable reproducible, richly interactive geospatial analyses and applications ranging from simple pre-processing to published tools. MyGeoHub is not a monolithic geospatial science gateway, instead it supports diverse needs ranging from just a feature-rich data management system, to complex scientific tools and workflows.
Structured recording of intraoperative surgical workflows

NASA Astrophysics Data System (ADS)

Neumuth, T.; Durstewitz, N.; Fischer, M.; Strauss, G.; Dietz, A.; Meixensberger, J.; Jannin, P.; Cleary, K.; Lemke, H. U.; Burgert, O.

2006-03-01

Surgical Workflows are used for the methodical and scientific analysis of surgical interventions. The approach described here is a step towards developing surgical assist systems based on Surgical Workflows and integrated control systems for the operating room of the future. This paper describes concepts and technologies for the acquisition of Surgical Workflows by monitoring surgical interventions and their presentation. Establishing systems which support the Surgical Workflow in operating rooms requires a multi-staged development process beginning with the description of these workflows. A formalized description of surgical interventions is needed to create a Surgical Workflow. This description can be used to analyze and evaluate surgical interventions in detail. We discuss the subdivision of surgical interventions into work steps regarding different levels of granularity and propose a recording scheme for the acquisition of manual surgical work steps from running interventions. To support the recording process during the intervention, we introduce a new software architecture. Core of the architecture is our Surgical Workflow editor that is intended to deal with the manifold, complex and concurrent relations during an intervention. Furthermore, a method for an automatic generation of graphs is shown which is able to display the recorded surgical work steps of the interventions. Finally we conclude with considerations about extensions of our recording scheme to close the gap to S-PACS systems. The approach was used to record 83 surgical interventions from 6 intervention types from 3 different surgical disciplines: ENT surgery, neurosurgery and interventional radiology. The interventions were recorded at the University Hospital Leipzig, Germany and at the Georgetown University Hospital, Washington, D.C., USA.
AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsai, Yingssu; Stanford University, 333 Campus Drive, Mudd Building, Stanford, CA 94305-5080; McPhillips, Scott E.

New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data,more » performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.« less
A practical workflow for making anatomical atlases for biological research.

PubMed

Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles

2012-01-01

The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.
Detecting distant homologies on protozoans metabolic pathways using scientific workflows.

PubMed

da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta

2010-01-01

Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
RISA: Remote Interface for Science Analysis

NASA Astrophysics Data System (ADS)

Gabriel, C.; Ibarra, A.; de La Calle, I.; Salgado, J.; Osuna, P.; Tapiador, D.

2008-08-01

The Scientific Analysis System (SAS) is the package for interactive and pipeline data reduction of all XMM-Newton data. Freely distributed by ESA to run under many different operating systems, the SAS has been used by almost every one of the 1600 refereed scientific publications obtained so far from the mission. We are developing RISA, the Remote Interface for Science Analysis, which makes it possible to run SAS through fully configurable web service workflows, enabling observers to access and analyse data making use of all of the existing SAS functionalities, without any installation/download of software/data. The workflows run primarily but not exclusively on the ESAC Grid, which offers scalable processing resources, directly connected to the XMM-Newton Science Archive. A first project internal version of RISA was issued in May 2007, a public release is expected already within this year.
Asterism: an integrated, complete, and open-source approach for running seismologist continuous data-intensive analysis on heterogeneous systems

NASA Astrophysics Data System (ADS)

Ferreira da Silva, R.; Filgueira, R.; Deelman, E.; Atkinson, M.

2016-12-01

We present Asterism, an open source data-intensive framework, which combines the Pegasus and dispel4py workflow systems. Asterism aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. Asterism's key element is to leverage the strengths of each workflow system: dispel4py allows developing scientific applications locally and then automatically parallelize and scale them on a wide range of HPC infrastructures with no changes to the application's code; Pegasus orchestrates the distributed execution of applications while providing portability, automated data management, recovery, debugging, and monitoring, without users needing to worry about the particulars of the target execution systems. Asterism leverages the level of abstractions provided by each workflow system to describe hybrid workflows where no information about the underlying infrastructure is required beforehand. The feasibility of Asterism has been evaluated using the seismic ambient noise cross-correlation application, a common data-intensive analysis pattern used by many seismologists. The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The Asterism workflow is implemented as a Pegasus workflow composed of two tasks (Phase1 and Phase2), where each phase represents a dispel4py workflow. Pegasus tasks describe the in/output data at a logical level, the data dependency between tasks, and the e-Infrastructures and the execution engine to run each dispel4py workflow. We have instantiated the workflow using data from 1000 stations from the IRIS services, and run it across two heterogeneous resources described as Docker containers: MPI (Container2) and Storm (Container3) clusters (Figure 1). Each dispel4py workflow is mapped to a particular execution engine, and data transfers between resources are automatically handled by Pegasus. Asterism is freely available online at http://github.com/dispel4py/pegasus_dispel4py.
From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

PubMed Central

Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta

2015-01-01

Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. Contact: philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk. PMID:26154165
Cancer Diagnosis Epigenomics Scientific Workflow Scheduling in the Cloud Computing Environment Using an Improved PSO Algorithm

PubMed

N, Sadhasivam; R, Balamurugan; M, Pandi

2018-01-27

Objective: Epigenetic modifications involving DNA methylation and histone statud are responsible for the stable maintenance of cellular phenotypes. Abnormalities may be causally involved in cancer development and therefore could have diagnostic potential. The field of epigenomics refers to all epigenetic modifications implicated in control of gene expression, with a focus on better understanding of human biology in both normal and pathological states. Epigenomics scientific workflow is essentially a data processing pipeline to automate the execution of various genome sequencing operations or tasks. Cloud platform is a popular computing platform for deploying large scale epigenomics scientific workflow. Its dynamic environment provides various resources to scientific users on a pay-per-use billing model. Scheduling epigenomics scientific workflow tasks is a complicated problem in cloud platform. We here focused on application of an improved particle swam optimization (IPSO) algorithm for this purpose. Methods: The IPSO algorithm was applied to find suitable resources and allocate epigenomics tasks so that the total cost was minimized for detection of epigenetic abnormalities of potential application for cancer diagnosis. Result: The results showed that IPSO based task to resource mapping reduced total cost by 6.83 percent as compared to the traditional PSO algorithm. Conclusion: The results for various cancer diagnosis tasks showed that IPSO based task to resource mapping can achieve better costs when compared to PSO based mapping for epigenomics scientific application workflow. Creative Commons Attribution License
Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

NASA Astrophysics Data System (ADS)

Nguyen, P. T.; Chapman, D. R.; Halem, M.

2012-12-01

New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in the Nino 4 region, as well as a 1.9 Kelvin decadal Arctic warming in the 4u and 12u spectral regions. Additionally, we will present the frequency of extreme global warming events by the use of a normalized maximum BT in a grid cell relative to its local standard deviation. A low-latency Hadoop scheduling environment maintains data integrity and fault tolerance in a MapReduce data intensive Cloud environment while improving the "time to solution" metric by 35% when compared to a more traditional parallel processing system for the same dataset. Our next step will be to improve the usability of our Hadoop task scheduling system, to enable rapid prototyping of data intensive experiments by means of processing "kernels". We will report on the performance and experience of implementing these experiments on the NEX testbed, and propose the use of a graphical directed acyclic graph (DAG) interface to help us develop on-demand scientific experiments. Our workflow system works within Hadoop infrastructure as a replacement for the FIFO or FairScheduler, thus the use of Apache "Pig" latin or other Apache tools may also be worth investigating on the NEX system to improve the usability of our workflow scheduling infrastructure for rapid experimentation.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

NASA Astrophysics Data System (ADS)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

2014-06-01

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less
An Integrated Framework for Parameter-based Optimization of Scientific Workflows.

PubMed

Kumar, Vijay S; Sadayappan, P; Mehta, Gaurang; Vahi, Karan; Deelman, Ewa; Ratnakar, Varun; Kim, Jihie; Gil, Yolanda; Hall, Mary; Kurc, Tahsin; Saltz, Joel

2009-01-01

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.
Scientific Workflow Management in Proteomics

PubMed Central

de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

2012-01-01

Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Implementing bioinformatic workflows within the bioextract server

USDA-ARS?s Scientific Manuscript database

Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

NASA Astrophysics Data System (ADS)

Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

2011-12-01

Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo

2011-07-20

Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Support for Taverna workflows in the VPH-Share cloud platform.

PubMed

Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F

2017-07-01

To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

PubMed Central

Kadima, Hubert; Granado, Bertrand

2013-01-01

We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

PubMed

Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

2013-01-01

We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.
Distributed Data Integration Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Critchlow, T; Ludaescher, B; Vouk, M

The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, usemore » them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future.« less
Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David; Greenwald, Martin

The MPO (Metadata, Provenance, Ontology) Project successfully addressed the goal of improving the usefulness and traceability of scientific data by building a system that could capture and display all steps in the process of creating, analyzing and disseminating that data. Throughout history, scientists have generated handwritten logbooks to keep track of data, their hypotheses, assumptions, experimental setup, and computational processes as well as reflections on observations and issues encountered. Over the last several decades, with the growth of personal computers, handheld devices, and the World Wide Web, the handwritten logbook has begun to be replaced by electronic logbooks. This transitionmore » has brought increased capability such as supporting multi-media, hypertext, and fast searching. However, content creation and metadata (a set of data that describes and gives information about other data) capturing has for the most part remained a manual activity just as it was with handwritten logbooks. This has led to a fragmentation of data, processing, and annotation that has only accelerated as scientific workflows continue to increase in complexity. From a scientific perspective, it is very important to be able to understand the lineage of any piece of data: who, what, when, how, and why. This is typically referred to as data provenance. The fragmentation discussed previously often means that data provenance is lost. As scientific workflows move to powerful computers and become more complex, the ability to track all of the steps involved in creating a piece of data become even more difficult. It was the goal of the MPO (Metadata, Provenance, Ontology) Project to create a system (the MPO System) that allows for automatic provenance and metadata capturing in such a way to allow easy searching and browsing. This goal needed to be accomplished in a general way so that it may be used across a broad range of scientific domains, yet allow the addition of vocabulary (Ontology) that is domain specific as is required for intelligent searching and browsing in the scientific context. Through the creation and deployment of the MPO system, the goals of the project were achieved. An enhanced metadata, provenance, and ontology storage system was created. This was combined with innovative methodologies for navigating and exploring these data using a web browser for both experimental and simulation-based scientific research. In addition, a system to allow scientists to instrument their existing workflows for automatic metadata and provenance is part of the MPO system. In that way, a scientist can continue to use their existing methodology yet easily document their work. Workflows and data provenance can be displayed either graphically or in an electronic notebook format and support advanced search features including via ontology. The MPO system was successfully used in both Climate and Magnetic Fusion Energy Research. The software for the MPO system is located at https://github.com/MPO-Group/MPO and is open source distributed under the Revised BSD License. A demonstration site of the MPO system is open to the public and is available at https://mpo.psfc.mit.edu/. A Docker container release of the command line client is available for public download using the command docker pull jcwright/mpo-cli at https://hub.docker.com/r/jcwright/mpo-cli.« less
Using CyberShake Workflows to Manage Big Seismic Hazard Data on Large-Scale Open-Science HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2015-12-01

The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and migrating from file-based communication to MPI messaging, to greatly reduce the I/O demands and node-hour requirements of CyberShake. We will also present performance metrics from CyberShake Study 15.4, and discuss challenges that producers of Big Data on open-science HPC resources face moving forward.
An open source workflow for 3D printouts of scientific data volumes

NASA Astrophysics Data System (ADS)

Loewe, P.; Klump, J. F.; Wickert, J.; Ludwig, M.; Frigeri, A.

2013-12-01

As the amount of scientific data continues to grow, researchers need new tools to help them visualize complex data. Immersive data-visualisations are helpful, yet fail to provide tactile feedback and sensory feedback on spatial orientation, as provided from tangible objects. The gap in sensory feedback from virtual objects leads to the development of tangible representations of geospatial information to solve real world problems. Examples are animated globes [1], interactive environments like tangible GIS [2], and on demand 3D prints. The production of a tangible representation of a scientific data set is one step in a line of scientific thinking, leading from the physical world into scientific reasoning and back: The process starts with a physical observation, or from a data stream generated by an environmental sensor. This data stream is turned into a geo-referenced data set. This data is turned into a volume representation which is converted into command sequences for the printing device, leading to the creation of a 3D printout. As a last, but crucial step, this new object has to be documented and linked to the associated metadata, and curated in long term repositories to preserve its scientific meaning and context. The workflow to produce tangible 3D data-prints from science data at the German Research Centre for Geosciences (GFZ) was implemented as a software based on the Free and Open Source Geoinformatics tools GRASS GIS and Paraview. The workflow was successfully validated in various application scenarios at GFZ using a RapMan printer to create 3D specimens of elevation models, geological underground models, ice penetrating radar soundings for planetology, and space time stacks for Tsunami model quality assessment. While these first pilot applications have demonstrated the feasibility of the overall approach [3], current research focuses on the provision of the workflow as Software as a Service (SAAS), thematic generalisation of information content and long term curation. [1] http://www.arcscience.com/systemDetails/omniTechnology.html [2] http://video.esri.com/watch/53/landscape-design-with-tangible-gis [3] Löwe et al. (2013), Geophysical Research Abstracts, Vol. 15, EGU2013-1544-1.
A big data approach for climate change indicators processing in the CLIP-C project

NASA Astrophysics Data System (ADS)

D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni

2016-04-01

Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.
IQ-Station: A Low Cost Portable Immersive Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eric Whiting; Patrick O'Leary; William Sherman

2010-11-01

The emergence of inexpensive 3D TV’s, affordable input and rendering hardware and open-source software has created a yeasty atmosphere for the development of low-cost immersive environments (IE). A low cost IE system, or IQ-station, fashioned from commercial off the shelf technology (COTS), coupled with a targeted immersive application can be a viable laboratory instrument for enhancing scientific workflow for exploration and analysis. The use of an IQ-station in a laboratory setting also has the potential of quickening the adoption of a more sophisticated immersive environment as a critical enabler in modern scientific and engineering workflows. Prior work in immersive environmentsmore » generally required either a head mounted display (HMD) system or a large projector-based implementation both of which have limitations in terms of cost, usability, or space requirements. The solution presented here provides an alternative platform providing a reasonable immersive experience that addresses those limitations. Our work brings together the needed hardware and software to create a fully integrated immersive display and interface system that can be readily deployed in laboratories and common workspaces. By doing so, it is now feasible for immersive technologies to be included in researchers’ day-to-day workflows. The IQ-Station sets the stage for much wider adoption of immersive environments outside the small communities of virtual reality centers.« less
Dynamic reusable workflows for ocean science

USGS Publications Warehouse

Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

2016-01-01

Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.
A scientific workflow framework for (13)C metabolic flux analysis.

PubMed

Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

2016-08-20

Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
Toward server-side, high performance climate change data analytics in the Earth System Grid Federation (ESGF) eco-system

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Williams, Dean; Aloisio, Giovanni

2016-04-01

In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated (e.g., the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). Most of the tools currently available for scientific data analysis in the climate domain fail at large scale since they: (1) are desktop based and need the data locally; (2) are sequential, so do not benefit from available multicore/parallel machines; (3) do not provide declarative languages to express scientific data analysis tasks; (4) are domain-specific, which ties their adoption to a specific domain; and (5) do not provide a workflow support, to enable the definition of complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes ("datacubes"). The project relies on a strong background of high performance database management and OLAP systems to manage large scientific data sets. It also provides a native workflow management support, to define processing chains and workflows with tens to hundreds of data analytics operators to build real scientific use cases. With regard to interoperability aspects, the talk will present the contribution provided both to the RDA Working Group on Array Databases, and the Earth System Grid Federation (ESGF) Compute Working Team. Also highlighted will be the results of large scale climate model intercomparison data analysis experiments, for example: (1) defined in the context of the EU H2020 INDIGO-DataCloud project; (2) implemented in a real geographically distributed environment involving CMCC (Italy) and LLNL (US) sites; (3) exploiting Ophidia as server-side, parallel analytics engine; and (4) applied on real CMIP5 data sets available through ESGF.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

PubMed

Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

PubMed Central

Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
Cyberinfrastructure at IRIS: Challenges and Solutions Providing Integrated Data Access to EarthScope and Other Earth Science Data

NASA Astrophysics Data System (ADS)

Ahern, T. K.; Barga, R.; Casey, R.; Kamb, L.; Parastatidis, S.; Stromme, S.; Weertman, B. T.

2008-12-01

While mature methods of accessing seismic data from the IRIS DMC have existed for decades, the demands for improved interdisciplinary data integration call for new approaches. Talented software teams at the IRIS DMC, UNAVCO and the ICDP in Germany, have been developing web services for all EarthScope data including data from USArray, PBO and SAFOD. These web services are based upon SOAP and WSDL. The EarthScope Data Portal was the first external system to access data holdings from the IRIS DMC using Web Services. EarthScope will also draw more heavily upon products to aid in cross-disciplinary data reuse. A Product Management System called SPADE allows archive of and access to heterogeneous data products, presented as XML documents, at the IRIS DMC. Searchable metadata are extracted from the XML and enable powerful searches for products from EarthScope and other data sources. IRIS is teaming with the External Research Group at Microsoft Research to leverage a powerful Scientific Workflow Engine (Trident) and interact with the web services developed at centers such as IRIS to enable access to data services as well as computational services. We believe that this approach will allow web- based control of workflows and the invocation of computational services that transform data. This capability will greatly improve access to data across scientific disciplines. This presentation will review some of the traditional access tools as well as many of the newer approaches that use web services, scientific workflow to improve interdisciplinary data access.
A cyber-enabled spatial decision support system to inventory Mangroves in Mozambique: coupling scientific workflows and cloud computing

Treesearch

Wenwu Tang; Wenpeng Feng; Meijuan Jia; Jiyang Shi; Huifang Zuo; Christina E. Stringer; Carl C. Trettin

2017-01-01

Mangroves are an important terrestrial carbon reservoir with numerous ecosystem services. Yet, it is difficult to inventory mangroves because of their low accessibility. A sampling approach that produces accurate assessment while maximizing logistical integrity of inventory operation is often required. Spatial decision support systems (SDSSs) provide support for...
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

NASA Astrophysics Data System (ADS)

Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

2015-12-01

The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Automated metadata--final project report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David

This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Producing an Infrared Multiwavelength Galactic Plane Atlas Using Montage, Pegasus, and Amazon Web Services

NASA Astrophysics Data System (ADS)

Rynge, M.; Juve, G.; Kinney, J.; Good, J.; Berriman, B.; Merrihew, A.; Deelman, E.

2014-05-01

In this paper, we describe how to leverage cloud resources to generate large-scale mosaics of the galactic plane in multiple wavelengths. Our goal is to generate a 16-wavelength infrared Atlas of the Galactic Plane at a common spatial sampling of 1 arcsec, processed so that they appear to have been measured with a single instrument. This will be achieved by using the Montage image mosaic engine process observations from the 2MASS, GLIMPSE, MIPSGAL, MSX and WISE datasets, over a wavelength range of 1 μm to 24 μm, and by using the Pegasus Workflow Management System for managing the workload. When complete, the Atlas will be made available to the community as a data product. We are generating images that cover ±180° in Galactic longitude and ±20° in Galactic latitude, to the extent permitted by the spatial coverage of each dataset. Each image will be 5°x5° in size (including an overlap of 1° with neighboring tiles), resulting in an atlas of 1,001 images. The final size will be about 50 TBs. This paper will focus on the computational challenges, solutions, and lessons learned in producing the Atlas. To manage the computation we are using the Pegasus Workflow Management System, a mature, highly fault-tolerant system now in release 4.2.2 that has found wide applicability across many science disciplines. A scientific workflow describes the dependencies between the tasks and in most cases the workflow is described as a directed acyclic graph, where the nodes are tasks and the edges denote the task dependencies. A defining property for a scientific workflow is that it manages data flow between tasks. Applied to the galactic plane project, each 5 by 5 mosaic is a Pegasus workflow. Pegasus is used to fetch the source images, execute the image mosaicking steps of Montage, and store the final outputs in a storage system. As these workflows are very I/O intensive, care has to be taken when choosing what infrastructure to execute the workflow on. In our setup, we choose to use dynamically provisioned compute clusters running on the Amazon Elastic Compute Cloud (EC2). All our instances are using the same base image, which is configured to come up as a master node by default. The master node is a central instance from where the workflow can be managed. Additional worker instances are provisioned and configured to accept work assignments from the master node. The system allows for adding/removing workers in an ad hoc fashion, and could be run in large configurations. To-date we have performed 245,000 CPU hours of computing and generated 7,029 images and totaling 30 TB. With the current set up our runtime would be 340,000 CPU hours for the whole project. Using spot m2.4xlarge instances, the cost would be approximately $5,950. Using faster AWS instances, such as cc2.8xlarge could potentially decrease the total CPU hours and further reduce the compute costs. The paper will explore these tradeoffs.
What Not To Do: Anti-patterns for Developing Scientific Workflow Software Components

NASA Astrophysics Data System (ADS)

Futrelle, J.; Maffei, A. R.; Sosik, H. M.; Gallager, S. M.; York, A.

2013-12-01

Scientific workflows promise to enable efficient scaling-up of researcher code to handle large datasets and workloads, as well as documentation of scientific processing via standardized provenance records, etc. Workflow systems and related frameworks for coordinating the execution of otherwise separate components are limited, however, in their ability to overcome software engineering design problems commonly encountered in pre-existing components, such as scripts developed externally by scientists in their laboratories. In practice, this often means that components must be rewritten or replaced in a time-consuming, expensive process. In the course of an extensive workflow development project involving large-scale oceanographic image processing, we have begun to identify and codify 'anti-patterns'--problematic design characteristics of software--that make components fit poorly into complex automated workflows. We have gone on to develop and document low-effort solutions and best practices that efficiently address the anti-patterns we have identified. The issues, solutions, and best practices can be used to evaluate and improve existing code, as well as guiding the development of new components. For example, we have identified a common anti-pattern we call 'batch-itis' in which a script fails and then cannot perform more work, even if that work is not precluded by the failure. The solution we have identified--removing unnecessary looping over independent units of work--is often easier to code than the anti-pattern, as it eliminates the need for complex control flow logic in the component. Other anti-patterns we have identified are similarly easy to identify and often easy to fix. We have drawn upon experience working with three science teams at Woods Hole Oceanographic Institution, each of which has designed novel imaging instruments and associated image analysis code. By developing use cases and prototypes within these teams, we have undertaken formal evaluations of software components developed by programmers with widely varying levels of expertise, and have been able to discover and characterize a number of anti-patterns. Our evaluation methodology and testbed have also enabled us to assess the efficacy of strategies to address these anti-patterns according to scientifically relevant metrics, such as ability of algorithms to perform faster than the rate of data acquisition and the accuracy of workflow component output relative to ground truth. The set of anti-patterns and solutions we have identified augments of the body of more well-known software engineering anti-patterns by addressing additional concerns that obtain when a software component has to function as part of a workflow assembled out of independently-developed codebases. Our experience shows that identifying and resolving these anti-patterns reduces development time and improves performance without reducing component reusability.
A strategy for systemic toxicity assessment based on non-animal approaches: The Cosmetics Europe Long Range Science Strategy programme.

PubMed

Desprez, Bertrand; Dent, Matt; Keller, Detlef; Klaric, Martina; Ouédraogo, Gladys; Cubberley, Richard; Duplan, Hélène; Eilstein, Joan; Ellison, Corie; Grégoire, Sébastien; Hewitt, Nicola J; Jacques-Jamin, Carine; Lange, Daniela; Roe, Amy; Rothe, Helga; Blaauboer, Bas J; Schepky, Andreas; Mahony, Catherine

2018-08-01

When performing safety assessment of chemicals, the evaluation of their systemic toxicity based only on non-animal approaches is a challenging objective. The Safety Evaluation Ultimately Replacing Animal Test programme (SEURAT-1) addressed this question from 2011 to 2015 and showed that further research and development of adequate tools in toxicokinetic and toxicodynamic are required for performing non-animal safety assessments. It also showed how to implement tools like thresholds of toxicological concern (TTCs) and read-across in this context. This paper shows a tiered scientific workflow and how each tier addresses the four steps of the risk assessment paradigm. Cosmetics Europe established its Long Range Science Strategy (LRSS) programme, running from 2016 to 2020, based on the outcomes of SEURAT-1 to implement this workflow. Dedicated specific projects address each step of this workflow, which is introduced here. It tackles the question of evaluating the internal dose when systemic exposure happens. The applicability of the workflow will be shown through a series of case studies, which will be published separately. Even if the LRSS puts the emphasis on safety assessment of cosmetic relevant chemicals, it remains applicable to any type of chemical. Copyright © 2018. Published by Elsevier Ltd.

A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

PubMed

Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias

2011-03-21

Bioinformatics visualization tools are often not robust enough to support biomedical specialistsâ€™ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialistâ€™s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users â€œbundles of featuresâ€ consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.
Integrating advanced visualization technology into the planetary Geoscience workflow

NASA Astrophysics Data System (ADS)

Huffman, John; Forsberg, Andrew; Loomis, Andrew; Head, James; Dickson, James; Fassett, Caleb

2011-09-01

Recent advances in computer visualization have allowed us to develop new tools for analyzing the data gathered during planetary missions, which is important, since these data sets have grown exponentially in recent years to tens of terabytes in size. As part of the Advanced Visualization in Solar System Exploration and Research (ADVISER) project, we utilize several advanced visualization techniques created specifically with planetary image data in mind. The Geoviewer application allows real-time active stereo display of images, which in aggregate have billions of pixels. The ADVISER desktop application platform allows fast three-dimensional visualization of planetary images overlain on digital terrain models. Both applications include tools for easy data ingest and real-time analysis in a programmatic manner. Incorporation of these tools into our everyday scientific workflow has proved important for scientific analysis, discussion, and publication, and enabled effective and exciting educational activities for students from high school through graduate school.
BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication.

PubMed

Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin

2017-11-10

The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Applying Content Management to Automated Provenance Capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.

2008-04-10

Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
Tackling the "so what" problem in scientific research: a systems-based approach to resource and publication tracking.

PubMed

Harris, Paul A; Kirby, Jacqueline; Swafford, Jonathan A; Edwards, Terri L; Zhang, Minhua; Yarbrough, Tonya R; Lane, Lynda D; Helmer, Tara; Bernard, Gordon R; Pulley, Jill M

2015-08-01

Peer-reviewed publications are one measure of scientific productivity. From a project, program, or institutional perspective, publication tracking provides the quantitative data necessary to guide the prudent stewardship of federal, foundation, and institutional investments by identifying the scientific return for the types of support provided. In this article, the authors describe the Vanderbilt Institute for Clinical and Translational Research's (VICTR's) development and implementation of a semiautomated process through which publications are automatically detected in PubMed and adjudicated using a "just-in-time" workflow by a known pool of researchers (from Vanderbilt University School of Medicine and Meharry Medical College) who receive support from Vanderbilt's Clinical and Translational Science Award. Since implementation, the authors have (1) seen a marked increase in the number of publications citing VICTR support, (2) captured at a more granular level the relationship between specific resources/services and scientific output, (3) increased awareness of VICTR's scientific portfolio, and (4) increased efficiency in complying with annual National Institutes of Health progress reports. They present the methodological framework and workflow, measures of impact for the first 30 months, and a set of practical lessons learned to inform others considering a systems-based approach for resource and publication tracking. They learned that contacting multiple authors from a single publication can increase the accuracy of the resource attribution process in the case of multidisciplinary scientific projects. They also found that combining positive (e.g., congratulatory e-mails) and negative (e.g., not allowing future resource requests until adjudication is complete) triggers can increase compliance with publication attribution requests.
Dawn: A Simulation Model for Evaluating Costs and Tradeoffs of Big Data Science Architectures

NASA Astrophysics Data System (ADS)

Cinquini, L.; Crichton, D. J.; Braverman, A. J.; Kyo, L.; Fuchs, T.; Turmon, M.

2014-12-01

In many scientific disciplines, scientists and data managers are bracing for an upcoming deluge of big data volumes, which will increase the size of current data archives by a factor of 10-100 times. For example, the next Climate Model Inter-comparison Project (CMIP6) will generate a global archive of model output of approximately 10-20 Peta-bytes, while the upcoming next generation of NASA decadal Earth Observing instruments are expected to collect tens of Giga-bytes/day. In radio-astronomy, the Square Kilometre Array (SKA) will collect data in the Exa-bytes/day range, of which (after reduction and processing) around 1.5 Exa-bytes/year will be stored. The effective and timely processing of these enormous data streams will require the design of new data reduction and processing algorithms, new system architectures, and new techniques for evaluating computation uncertainty. Yet at present no general software tool or framework exists that will allow system architects to model their expected data processing workflow, and determine the network, computational and storage resources needed to prepare their data for scientific analysis. In order to fill this gap, at NASA/JPL we have been developing a preliminary model named DAWN (Distributed Analytics, Workflows and Numerics) for simulating arbitrary complex workflows composed of any number of data processing and movement tasks. The model can be configured with a representation of the problem at hand (the data volumes, the processing algorithms, the available computing and network resources), and is able to evaluate tradeoffs between different possible workflows based on several estimators: overall elapsed time, separate computation and transfer times, resulting uncertainty, and others. So far, we have been applying DAWN to analyze architectural solutions for 4 different use cases from distinct science disciplines: climate science, astronomy, hydrology and a generic cloud computing use case. This talk will present preliminary results and discuss how DAWN can be evolved into a powerful tool for designing system architectures for data intensive science.
Agent-Based Scientific Workflow Composition

NASA Astrophysics Data System (ADS)

Barker, A.; Mann, B.

2006-07-01

Agents are active autonomous entities that interact with one another to achieve their objectives. This paper addresses how these active agents are a natural fit to consume the passive Service Oriented Architecture which is found in Internet and Grid Systems, in order to compose, coordinate and execute e-Science experiments. A framework is introduced which allows an e-Science experiment to be described as a MultiAgent System.
Seamless Provenance Representation and Use in Collaborative Science Scenarios

NASA Astrophysics Data System (ADS)

Missier, P.; Ludaescher, B.; Bowers, S.; Altintas, I.; Anand, M. K.; Dey, S.; Sarkar, A.; Shrestha, B.; Goble, C.

2010-12-01

The notion of sharing scientific data has only recently begun to gain ground in science, where data is still considered a private asset. There is growing evidence, however, that the benefits of scientific collaboration through early data sharing during the course of a science project may outgrow the risk of losing exclusive ownership of the data. As exemplar success stories are making the headlines[1], principles of effective information sharing have become the subject of e-science research. In particular, any piece of published data should be self-describing, to the extent necessary for consumers to determine its suitability for reuse in their own projects. This is accomplished by associating a body of formally specified and machine-processable metadata to the data. When data is produced and reused by independent groups, however, metadata interoperability issues emerge. This is the case for provenance, a form of metadata that describes the history of a data product, Y. Provenance is typically expressed as a graph-structured set of dependencies that account for the sequence of computational or interactive steps that led to Y, often starting from some primary, observational data. Traversing dependency graphs is one of the mechanisms used to answer questions on data reliability. In the context of the NSF DataONE project[2], we have been studying issues of provenance interoperability in scientific collaboration scenarios. Consider a first scientist, Alice, who publishes a data product X along with its provenance, and a second scientist who further transforms X into a new product Y, also along with its provenance. A third scientist, who is interested in Y, expects to be able to trace Y's history up to the inputs used by Alice. This is only possible, however, if provenance accumulates into a single, uniform graph that can be seamlessly traversed. This becomes problematic when provenance is captured using different tools and computational models (i.e. workflow systems), as well as when data is published and reused using mechanisms that are not provenance-aware. In this presentation we discuss requirements for ensuring provenance-aware data publishing and reuse, and describe the design and implementation of a prototype toolkit that involves two specific, and broadly used, workflow models, Kepler [3] and Taverna [4]. The implementation is expected to be adopted as part of DataONE's investigators' toolkit, in support of its mission of large-scale data preservation. Refs. [1]Sharing of Data Leads to Progress on Alzheimer’s, G. Kolata, NYT, 8/12/2010 [2]http://www.dataone.org [3]Ludaescher B., Altintas I. et al. Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065, 2006 [4]D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, T. Oinn. Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 34: W729-W732, 2006
Big data analytics workflow management for eScience

NASA Astrophysics Data System (ADS)

Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

2015-04-01

In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the real challenge in many practical scientific use cases. This talk will specifically address the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Three real use cases concerning analytics workflows for sea situational awareness, fire danger prevention, climate change and biodiversity will be discussed in detail.
A Domain Analysis Model for eIRB Systems: Addressing the Weak Link in Clinical Research Informatics

PubMed Central

He, Shan; Narus, Scott P.; Facelli, Julio C.; Lau, Lee Min; Botkin, Jefferey R.; Hurdle, John F.

2014-01-01

Institutional Review Boards (IRBs) are a critical component of clinical research and can become a significant bottleneck due to the dramatic increase, in both volume and complexity of clinical research. Despite the interest in developing clinical research informatics (CRI) systems and supporting data standards to increase clinical research efficiency and interoperability, informatics research in the IRB domain has not attracted much attention in the scientific community. The lack of standardized and structured application forms across different IRBs causes inefficient and inconsistent proposal reviews and cumbersome workflows. These issues are even more prominent in multi-institutional clinical research that is rapidly becoming the norm. This paper proposes and evaluates a domain analysis model for electronic IRB (eIRB) systems, paving the way for streamlined clinical research workflow via integration with other CRI systems and improved IRB application throughput via computer-assisted decision support. PMID:24929181
Tackling the “So What” Problem in Scientific Research: A Systems-Based Approach to Resource and Publication Tracking

PubMed Central

Harris, Paul A.; Kirby, Jacqueline; Swafford, Jonathan A.; Edwards, Terri L.; Zhang, Minhua; Yarbrough, Tonya R.; Lane, Lynda D.; Helmer, Tara; Bernard, Gordon R.; Pulley, Jill M.

2015-01-01

Peer-reviewed publications are one measure of scientific productivity. From a project, program, or institutional perspective, publication tracking provides the quantitative data necessary to guide the prudent stewardship of federal, foundation, and institutional investments by identifying the scientific return for the types of support provided. In this article, the authors describe the Vanderbilt Institute for Clinical and Translational Research’s (VICTR’s) development and implementation of a semi-automated process through which publications are automatically detected in PubMed and adjudicated using a “just-in-time” workflow by a known pool of researchers (from Vanderbilt University School of Medicine and Meharry Medical College) who receive support from Vanderbilt’s Clinical and Translational Science Award. Since implementation, the authors have: (1) seen a marked increase in the number of publications citing VICTR support; (2) captured at a more granular level the relationship between specific resources/services and scientific output; (3) increased awareness of VICTR’s scientific portfolio; and (4) increased efficiency in complying with annual National Institutes of Health progress reports. They present the methodological framework and workflow, measures of impact for the first 30 months, and a set of practical lessons learned to inform others considering a systems-based approach for resource and publication tracking. They learned that contacting multiple authors from a single publication can increase the accuracy of the resource attribution process in the case of multidisciplinary scientific projects. They also found that combining positive (e.g., congratulatory e-mails) and negative (e.g., not allowing future resource requests until adjudication is complete) triggers can increase compliance with publication attribution requests. PMID:25901872
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

PubMed Central

Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

2015-01-01

Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Climate Data Analytics Workflow Management

NASA Astrophysics Data System (ADS)

Zhang, J.; Lee, S.; Pan, L.; Mattmann, C. A.; Lee, T. J.

2016-12-01

In this project we aim to pave a novel path to create a sustainable building block toward Earth science big data analytics and knowledge sharing. Closely studying how Earth scientists conduct data analytics research in their daily work, we have developed a provenance model to record their activities, and to develop a technology to automatically generate workflows for scientists from the provenance. On top of it, we have built the prototype of a data-centric provenance repository, and establish a PDSW (People, Data, Service, Workflow) knowledge network to support workflow recommendation. To ensure the scalability and performance of the expected recommendation system, we have leveraged the Apache OODT system technology. The community-approved, metrics-based performance evaluation web-service will allow a user to select a metric from the list of several community-approved metrics and to evaluate model performance using the metric as well as the reference dataset. This service will facilitate the use of reference datasets that are generated in support of the model-data intercomparison projects such as Obs4MIPs and Ana4MIPs. The data-centric repository infrastructure will allow us to catch richer provenance to further facilitate knowledge sharing and scientific collaboration in the Earth science community. This project is part of Apache incubator CMDA project.
A web accessible scientific workflow system for vadoze zone performance monitoring: design and implementation examples

NASA Astrophysics Data System (ADS)

Mattson, E.; Versteeg, R.; Ankeny, M.; Stormberg, G.

2005-12-01

Long term performance monitoring has been identified by DOE, DOD and EPA as one of the most challenging and costly elements of contaminated site remedial efforts. Such monitoring should provide timely and actionable information relevant to a multitude of stakeholder needs. This information should be obtained in a manner which is auditable, cost effective and transparent. Over the last several years INL staff has designed and implemented a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition from diverse sensors (geophysical, geochemical and hydrological) with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic javascript and html/css) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This system has been implemented and is operational for several sites, including the Ruby Gulch Waste Rock Repository (a capped mine waste rock dump on the Gilt Edge Mine Superfund Site), the INL Vadoze Zone Research Park and an alternative cover landfill. Implementations for other vadoze zone sites are currently in progress. These systems allow for autonomous performance monitoring through automated data analysis and report generation. This performance monitoring has allowed users to obtain insights into system dynamics, regulatory compliance and residence times of water. Our system uses modular components for data selection and graphing and WSDL compliant webservices for external functions such as statistical analyses and model invocations. Thus, implementing this system for novel sites and extending functionality (e.g. adding novel models) is relatively straightforward. As system access requires a standard webbrowser and uses intuitive functionality, stakeholders with diverse degrees of technical insight can use this system with little or no training.
Real-Time System for Water Modeling and Management

NASA Astrophysics Data System (ADS)

Lee, J.; Zhao, T.; David, C. H.; Minsker, B.

2012-12-01

Working closely with the Texas Commission on Environmental Quality (TCEQ) and the University of Texas at Austin (UT-Austin), we are developing a real-time system for water modeling and management using advanced cyberinfrastructure, data integration and geospatial visualization, and numerical modeling. The state of Texas suffered a severe drought in 2011 that cost the state $7.62 billion in agricultural losses (crops and livestock). Devastating situations such as this could potentially be avoided with better water modeling and management strategies that incorporate state of the art simulation and digital data integration. The goal of the project is to prototype a near-real-time decision support system for river modeling and management in Texas that can serve as a national and international model to promote more sustainable and resilient water systems. The system uses National Weather Service current and predicted precipitation data as input to the Noah-MP Land Surface model, which forecasts runoff, soil moisture, evapotranspiration, and water table levels given land surface features. These results are then used by a river model called RAPID, along with an error model currently under development at UT-Austin, to forecast stream flows in the rivers. Model forecasts are visualized as a Web application for TCEQ decision makers, who issue water diversion (withdrawal) permits and any needed drought restrictions; permit holders; and reservoir operation managers. Users will be able to adjust model parameters to predict the impacts of alternative curtailment scenarios or weather forecasts. A real-time optimization system under development will help TCEQ to identify optimal curtailment strategies to minimize impacts on permit holders and protect health and safety. To develop the system we have implemented RAPID as a remotely-executed modeling service using the Cyberintegrator workflow system with input data downloaded from the North American Land Data Assimilation System. The Cyberintegrator workflow system provides RESTful web services for users to provide inputs, execute workflows, and retrieve outputs. Along with REST endpoints, PAW (Publishable Active Workflows) provides the web user interface toolkit for us to develop web applications with scientific workflows. The prototype web application is built on top of workflows with PAW, so that users will have a user-friendly web environment to provide input parameters, execute the model, and visualize/retrieve the results using geospatial mapping tools. In future work the optimization model will be developed and integrated into the workflow.; Real-Time System for Water Modeling and Management
Data Curation: Improving Environmental Health Data Quality.

PubMed

Yang, Lin; Li, Jiao; Hou, Li; Qian, Qing

2015-01-01

With the growing recognition of the influence of climate change on human health, scientists' attention to analyzing the relationship between meteorological factors and adverse health effects. However, the paucity of high quality integrated data is one of the great challenges, especially when scientific studies rely on data-intensive computing. This paper aims to design an appropriate curation process to address this problem. We present a data curation workflow that: (i) follows the guidance of DCC Curation Lifecycle Model; (ii) combines manual curation with automatic curation; (iii) and solves environmental health data curation problem. The workflow was applied to a medical knowledge service system and showed that it was capable of improving work efficiency and data quality.
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.

2013-12-01

The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

NASA Astrophysics Data System (ADS)

Harris, A. T.; Ramachandran, R.; Maskey, M.

2013-12-01

The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL Workbench, coupled with ESE for execution in the cloud, asynchronous workflows could be executed in batch mode on large data in the cloud. We envision that a scientist will initially develop a scientific workflow locally on a small set of data. Once tested, the scientist will deploy the workflow to the cloud for execution. Depending on the results, the scientist may share the workflow and results, allowing them to be stored in a community catalog and instantly loaded into the IDL Workbench of other scientists. Thereupon, scientists can clone and modify or execute the workflow with different input parameters. The Collaborative Workbench will provide a platform for collaboration in the cloud, helping Earth scientists solve big-data problems in the Earth and planetary sciences.
SAHM:VisTrails (Software for Assisted Habitat Modeling for VisTrails): training course

USGS Publications Warehouse

Holcombe, Tracy

2014-01-01

VisTrails is an open-source management and scientific workflow system designed to integrate the best of both scientific workflow and scientific visualization systems. Developers can extend the functionality of the VisTrails system by creating custom modules for bundled VisTrails packages. The Invasive Species Science Branch of the U.S. Geological Survey (USGS) Fort Collins Science Center (FORT) and the U.S. Department of the Interior’s North Central Climate Science Center have teamed up to develop and implement such a module—the Software for Assisted Habitat Modeling (SAHM). SAHM expedites habitat modeling and helps maintain a record of the various input data, the steps before and after processing, and the modeling options incorporated in the construction of an ecological response model. There are four main advantages to using the SAHM:VisTrails combined package for species distribution modeling: (1) formalization and tractable recording of the entire modeling process; (2) easier collaboration through a common modeling framework; (3) a user-friendly graphical interface to manage file input, model runs, and output; and (4) extensibility to incorporate future and additional modeling routines and tools. In order to meet increased interest in the SAHM:VisTrails package, the FORT offers a training course twice a year. The course includes a combination of lecture, hands-on work, and discussion. Please join us and other ecological modelers to learn the capabilities of the SAHM:VisTrails package.
BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.

PubMed

Hardisty, Alex R; Bacall, Finn; Beard, Niall; Balcázar-Vargas, Maria-Paula; Balech, Bachir; Barcza, Zoltán; Bourlat, Sarah J; De Giovanni, Renato; de Jong, Yde; De Leo, Francesca; Dobor, Laura; Donvito, Giacinto; Fellows, Donal; Guerra, Antonio Fernandez; Ferreira, Nuno; Fetyukova, Yuliya; Fosso, Bruno; Giddy, Jonathan; Goble, Carole; Güntsch, Anton; Haines, Robert; Ernst, Vera Hernández; Hettling, Hannes; Hidy, Dóra; Horváth, Ferenc; Ittzés, Dóra; Ittzés, Péter; Jones, Andrew; Kottmann, Renzo; Kulawik, Robert; Leidenberger, Sonja; Lyytikäinen-Saarenmaa, Päivi; Mathew, Cherian; Morrison, Norman; Nenadic, Aleksandra; de la Hidalga, Abraham Nieva; Obst, Matthias; Oostermeijer, Gerard; Paymal, Elisabeth; Pesole, Graziano; Pinto, Salvatore; Poigné, Axel; Fernandez, Francisco Quevedo; Santamaria, Monica; Saarenmaa, Hannu; Sipos, Gergely; Sylla, Karl-Heinz; Tähtinen, Marko; Vicario, Saverio; Vos, Rutger Aldo; Williams, Alan R; Yilmaz, Pelin

2016-10-20

Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.

A Web Interface for Eco System Modeling

NASA Astrophysics Data System (ADS)

McHenry, K.; Kooper, R.; Serbin, S. P.; LeBauer, D. S.; Desai, A. R.; Dietze, M. C.

2012-12-01

We have developed the Predictive Ecosystem Analyzer (PEcAn) as an open-source scientific workflow system and ecoinformatics toolbox that manages the flow of information in and out of regional-scale terrestrial biosphere models, facilitates heterogeneous data assimilation, tracks data provenance, and enables more effective feedback between models and field research. The over-arching goal of PEcAn is to make otherwise complex analyses transparent, repeatable, and accessible to a diverse array of researchers, allowing both novice and expert users to focus on using the models to examine complex ecosystems rather than having to deal with complex computer system setup and configuration questions in order to run the models. Through the developed web interface we hide much of the data and model details and allow the user to simply select locations, ecosystem models, and desired data sources as inputs to the model. Novice users are guided by the web interface through setting up a model execution and plotting the results. At the same time expert users are given enough freedom to modify specific parameters before the model gets executed. This will become more important as more and more models are added to the PEcAn workflow as well as more and more data that will become available as NEON comes online. On the backend we support the execution of potentially computationally expensive models on different High Performance Computers (HPC) and/or clusters. The system can be configured with a single XML file that gives it the flexibility needed for configuring and running the different models on different systems using a combination of information stored in a database as well as pointers to files on the hard disk. While the web interface usually creates this configuration file, expert users can still directly edit it to fine tune the configuration.. Once a workflow is finished the web interface will allow for the easy creation of plots over result data while also allowing the user to download the results for further processing. The current workflow in the web interface is a simple linear workflow, but will be expanded to allow for more complex workflows. We are working with Kepler and Cyberintegrator to allow for these more complex workflows as well as collecting provenance of the workflow being executed. This provenance regarding model executions is stored in a database along with the derived results. All of this information is then accessible using the BETY database web frontend. The PEcAn interface.
iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis

PubMed Central

2009-01-01

Background In recent years, the genome biology community has expended considerable effort to confront the challenges of managing heterogeneous data in a structured and organized way and developed laboratory information management systems (LIMS) for both raw and processed data. On the other hand, electronic notebooks were developed to record and manage scientific data, and facilitate data-sharing. Software which enables both, management of large datasets and digital recording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques. Results We have developed iLAP (Laboratory data management, Analysis, and Protocol development), a workflow-driven information management system specifically designed to create and manage experimental protocols, and to analyze and share laboratory data. The system combines experimental protocol development, wizard-based data acquisition, and high-throughput data analysis into a single, integrated system. We demonstrate the power and the flexibility of the platform using a microscopy case study based on a combinatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under the open source license AGPL from http://genome.tugraz.at/iLAP/. Conclusion iLAP is a flexible and versatile information management system, which has the potential to close the gap between electronic notebooks and LIMS and can therefore be of great value for a broad scientific community. PMID:19941647
Facilitating hydrological data analysis workflows in R: the RHydro package

NASA Astrophysics Data System (ADS)

Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

2015-04-01

The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges of the RHydro package, including integration with big data technologies, web technologies, and emerging data models in hydrology.
Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers.

PubMed

Sochat, Vanessa V; Prybol, Cameron J; Kurtzer, Gregory M

2017-01-01

Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub's primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers.
Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers

PubMed Central

Prybol, Cameron J.; Kurtzer, Gregory M.

2017-01-01

Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub’s primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers. PMID:29186161
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users.

PubMed

Watts, Nicholas A; Feltus, Frank A

2017-02-15

The ability to centralize and store data for long periods on an end user's computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS . ffeltus@clemson.edu. © The Author 2016. Published by Oxford University Press.
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users

PubMed Central

Watts, Nicholas A.

2017-01-01

Motivation: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. Results: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. Availability and Implementation: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS. Contact: ffeltus@clemson.edu PMID:27797780
Nationwide Buildings Energy Research enabled through an integrated Data Intensive Scientific Workflow and Advanced Analysis Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kleese van Dam, Kerstin; Lansing, Carina S.; Elsethagen, Todd O.

2014-01-28

Modern workflow systems enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve, due to the inherent resource requirements needed for the modeling work. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we will describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create andmore » execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The integration of simulation and analysis environments is hereby not only a question of ease of use, but supports fundamental functions in the correlated analysis of simulation input, execution details and derived results for multi-variant, complex studies. To this end the team extended and integrated the existing capabilities of the Velo data management and analysis infrastructure, the MeDICi data intensive workflow system and RHIPE the R for Hadoop version of the well-known statistics package, as well as developing a new visual analytics interface for the result exploitation by multi-domain users. The capabilities of the new environment are demonstrated on a use case that focusses on the Pacific Northwest National Laboratory (PNNL) building energy team, showing how they were able to take their previously local scale simulations to a nationwide level by utilizing data intensive computing techniques not only for their modeling work, but also for the subsequent analysis of their modeling results. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3 year study of building energy demands for the US Eastern Interconnect domain, which they are now planning to extend to predict the demand for the complete century. The initial study raised their data demands from a few GBs to 400GB for the 3year study and expected tens of TBs for the full century.« less
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Ramachandran, R.; Lynnes, C.

2009-05-01

A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi

As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

PubMed Central

Stockton, David B.; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

PubMed

Stockton, David B; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
Workflow management systems in radiology

NASA Astrophysics Data System (ADS)

Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim

1998-07-01

In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.
Industrial methodology for process verification in research (IMPROVER): toward systems biology verification

PubMed Central

Meyer, Pablo; Hoeng, Julia; Rice, J. Jeremy; Norel, Raquel; Sprengel, Jörg; Stolle, Katrin; Bonk, Thomas; Corthesy, Stephanie; Royyuru, Ajay; Peitsch, Manuel C.; Stolovitzky, Gustavo

2012-01-01

Motivation: Analyses and algorithmic predictions based on high-throughput data are essential for the success of systems biology in academic and industrial settings. Organizations, such as companies and academic consortia, conduct large multi-year scientific studies that entail the collection and analysis of thousands of individual experiments, often over many physical sites and with internal and outsourced components. To extract maximum value, the interested parties need to verify the accuracy and reproducibility of data and methods before the initiation of such large multi-year studies. However, systematic and well-established verification procedures do not exist for automated collection and analysis workflows in systems biology which could lead to inaccurate conclusions. Results: We present here, a review of the current state of systems biology verification and a detailed methodology to address its shortcomings. This methodology named ‘Industrial Methodology for Process Verification in Research’ or IMPROVER, consists on evaluating a research program by dividing a workflow into smaller building blocks that are individually verified. The verification of each building block can be done internally by members of the research program or externally by ‘crowd-sourcing’ to an interested community. www.sbvimprover.com Implementation: This methodology could become the preferred choice to verify systems biology research workflows that are becoming increasingly complex and sophisticated in industrial and academic settings. Contact: gustavo@us.ibm.com PMID:22423044
A Scientific Workflow System for Satellite Data Processing with Real-Time Monitoring

NASA Astrophysics Data System (ADS)

Nguyen, Minh Duc

2018-02-01

This paper provides a case study on satellite data processing, storage, and distribution in the space weather domain by introducing the Satellite Data Downloading System (SDDS). The approach proposed in this paper was evaluated through real-world scenarios and addresses the challenges related to the specific field. Although SDDS is used for satellite data processing, it can potentially be adapted to a wide range of data processing scenarios in other fields of physics.
Theoretical and technological building blocks for an innovation accelerator

NASA Astrophysics Data System (ADS)

van Harmelen, F.; Kampis, G.; Börner, K.; van den Besselaar, P.; Schultes, E.; Goble, C.; Groth, P.; Mons, B.; Anderson, S.; Decker, S.; Hayes, C.; Buecheler, T.; Helbing, D.

2012-11-01

Modern science is a main driver of technological innovation. The efficiency of the scientific system is of key importance to ensure the competitiveness of a nation or region. However, the scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often not available to other scientists, and the independent validation of results is limited. The resulting scientific process is hence slow and sloppy. Building on the Innovation Accelerator paper by Helbing and Balietti [1], this paper takes the initial global vision and reviews the theoretical and technological building blocks that can be used for implementing an innovation (in first place: science) accelerator platform driven by re-imagining the science system. The envisioned platform would rest on four pillars: (i) Redesign the incentive scheme to reduce behavior such as conservatism, herding and hyping; (ii) Advance scientific publications by breaking up the monolithic paper unit and introducing other building blocks such as data, tools, experiment workflows, resources; (iii) Use machine readable semantics for publications, debate structures, provenance etc. in order to include the computer as a partner in the scientific process, and (iv) Build an online platform for collaboration, including a network of trust and reputation among the different types of stakeholders in the scientific system: scientists, educators, funding agencies, policy makers, students and industrial innovators among others. Any such improvements to the scientific system must support the entire scientific process (unlike current tools that chop up the scientific process into disconnected pieces), must facilitate and encourage collaboration and interdisciplinarity (again unlike current tools), must facilitate the inclusion of intelligent computing in the scientific process, must facilitate not only the core scientific process, but also accommodate other stakeholders such science policy makers, industrial innovators, and the general public. We first describe the current state of the scientific system together with up to a dozen new key initiatives, including an analysis of the role of science as an innovation accelerator. Our brief survey will show that there exist many separate ideas and concepts and diverse stand-alone demonstrator systems for different components of the ecosystem with many parts are still unexplored, and overall integration lacking. By analyzing a matrix of stakeholders vs. functionalities, we identify the required innovations. We (non-exhaustively) discuss a few of them: Publications that are meaningful to machines, innovative reviewing processes, data publication, workflow archiving and reuse, alternative impact metrics, tools for the detection of trends, community formation and emergence, as well as modular publications, citation objects and debate graphs. To summarize, the core idea behind the Innovation Accelerator is to develop new incentive models, rules, and interaction mechanisms to stimulate true innovation, revolutionizing the way in which we create knowledge and disseminate information.
qPortal: A platform for data-driven biomedical research.

PubMed

Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

2018-01-01

Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.
The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

NASA Astrophysics Data System (ADS)

Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

2012-12-01

The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models, code, data, processing) are shared in the one virtual laboratory. VGL provides end users with access to an intuitive, user-centered interface that leverages cloud storage and cloud and cluster processing from both the research communities and commercial suppliers (e.g. Amazon). As the underlying data and information services are agnostic of the scientific domain, they can support many other data types. This fundamental characteristic results in a highly reusable virtual laboratory infrastructure that could also be used for example natural hazards, satellite processing, soil geochemistry, climate modeling, agriculture crop modeling.
Genetic Design Automation: engineering fantasy or scientific renewal?

PubMed Central

Lux, Matthew W.; Bramlett, Brian W.; Ball, David A.; Peccoud, Jean

2013-01-01

Synthetic biology aims to make genetic systems more amenable to engineering, which has naturally led to the development of Computer-Aided Design (CAD) tools. Experimentalists still primarily rely on project-specific ad-hoc workflows instead of domain-specific tools, suggesting that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. PMID:22001068
Using Kepler for Tool Integration in Microarray Analysis Workflows.

PubMed

Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

Bioinformatics workflows and web services in systems biology made easy for experimentalists.

PubMed

Jimenez, Rafael C; Corpas, Manuel

2013-01-01

Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
Kwf-Grid workflow management system for Earth science applications

NASA Astrophysics Data System (ADS)

Tran, V.; Hluchy, L.

2009-04-01

In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

PubMed

Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa

2012-05-04

Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
wft4galaxy: a workflow testing tool for galaxy.

PubMed

Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi

2017-12-01

Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
Realizing the potential of the CUAHSI Water Data Center to advance Earth Science

NASA Astrophysics Data System (ADS)

Hooper, R. P.; Seul, M.; Pollak, J.; Couch, A.

2015-12-01

The CUAHSI Water Data Center has developed a cloud-based system for data publication, discovery and access. Key features of this system are a semantically enabled catalog to discover data across more than 100 different services and delivery of data and metadata in a standard format. While this represents a significant technical achievement, the purpose of this system is to support data reanalysis for advancing science. A new web-based client, HydroClient, improves access to the data from previous clients. This client is envisioned as the first step in a workflow that can involve visualization and analysis using web-processing services, followed by download to local computers for further analysis. The release of the WaterML library in the R package CRAN repository is an initial attempt at linking the WDC services in a larger analysis workflow. We are seeking community input on other resources required to make the WDC services more valuable in scientific research and education.
Workflows for Full Waveform Inversions

NASA Astrophysics Data System (ADS)

Boehm, Christian; Krischer, Lion; Afanasiev, Michael; van Driel, Martin; May, Dave A.; Rietmann, Max; Fichtner, Andreas

2017-04-01

Despite many theoretical advances and the increasing availability of high-performance computing clusters, full seismic waveform inversions still face considerable challenges regarding data and workflow management. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these often manpower-bounded issues that need to be overcome to facilitate further progress. Modern inversions involve huge amounts of data and require a tight integration between numerical PDE solvers, data acquisition and processing systems, nonlinear optimization libraries, and job orchestration frameworks. To this end we created a set of libraries and applications revolving around Salvus (http://salvus.io), a novel software package designed to solve large-scale full waveform inverse problems. This presentation focuses on solving passive source seismic full waveform inversions from local to global scales with Salvus. We discuss (i) design choices for the aforementioned components required for full waveform modeling and inversion, (ii) their implementation in the Salvus framework, and (iii) how it is all tied together by a usable workflow system. We combine state-of-the-art algorithms ranging from high-order finite-element solutions of the wave equation to quasi-Newton optimization algorithms using trust-region methods that can handle inexact derivatives. All is steered by an automated interactive graph-based workflow framework capable of orchestrating all necessary pieces. This naturally facilitates the creation of new Earth models and hopefully sparks new scientific insights. Additionally, and even more importantly, it enhances reproducibility and reliability of the final results.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology

NASA Astrophysics Data System (ADS)

Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm

2015-04-01

Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
DEWEY: the DICOM-enabled workflow engine system.

PubMed

Erickson, Bradley J; Langer, Steve G; Blezek, Daniel J; Ryan, William J; French, Todd L

2014-06-01

Workflow is a widely used term to describe the sequence of steps to accomplish a task. The use of workflow technology in medicine and medical imaging in particular is limited. In this article, we describe the application of a workflow engine to improve workflow in a radiology department. We implemented a DICOM-enabled workflow engine system in our department. We designed it in a way to allow for scalability, reliability, and flexibility. We implemented several workflows, including one that replaced an existing manual workflow and measured the number of examinations prepared in time without and with the workflow system. The system significantly increased the number of examinations prepared in time for clinical review compared to human effort. It also met the design goals defined at its outset. Workflow engines appear to have value as ways to efficiently assure that complex workflows are completed in a timely fashion.
Next Generation Global Navigation Satellite Systems (GNSS) Processing at NASA CDDIS

NASA Astrophysics Data System (ADS)

Michael, B. P.; Noll, C. E.

2016-12-01

The Crustal Dynamics Data Information System (CDDIS) has been providing access to space geodesy and related data sets since 1982, and in particular, Global Navigation Satellite Systems (GNSS) data and derived products since 1992. The CDDIS became one of the Earth Observing System Data and Information System (EOSDIS) archive centers in 2007. As such, CDDIS has evolved to offer a broad range of data ingest services, from data upload, quality control, documentation, metadata extraction, and ancillary information. With a growing understanding of the needs and goals of its science users CDDIS continues to improve these services. Due to the importance of GNSS data and derived products in scientific studies over the last decade, CDDIS has seen its ingest volume explode to over 30 million files per year or more than one file per second from over hundreds of simultaneous data providers. In order to accommodate this increase and to streamline operations and fully automate the workflow, CDDIS has recently updated the data submission process and GNSS processing. This poster will cover this new ingest infrastructure, workflow, and the agile techniques applied in its development and current operations.
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.

2013-12-01

As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Ramachandran, R.; Lynnes, C.

2009-12-01

A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using “service casts” and “interest casts” (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH’s Mining Workflow Composer and the open-source Active BPEL engine, and JPL’s SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the “sociological” problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
An ontology-based framework for bioinformatics workflows.

PubMed

Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

2007-01-01

The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support

PubMed Central

2012-01-01

Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org. PMID:22559942
The Science DMZ: A Network Design Pattern for Data-Intensive Science

DOE PAGES

Dart, Eli; Rotman, Lauren; Tierney, Brian; ...

2014-01-01

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The Science DMZ paradigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers andmore » research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.« less
Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

PubMed

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-08-07

There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

PubMed Central

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-01-01

Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127
A Scalable, Open Source Platform for Data Processing, Archiving and Dissemination

DTIC Science & Technology

2016-01-01

Object Oriented Data Technology (OODT) big data toolkit developed by NASA and the Work-flow INstance Generation and Selection (WINGS) scientific work...to several challenge big data problems and demonstrated the utility of OODT-WINGS in addressing them. Specific demonstrated analyses address i...source software, Apache, Object Oriented Data Technology, OODT, semantic work-flows, WINGS, big data , work- flow management 16. SECURITY CLASSIFICATION OF
GUEST EDITOR'S INTRODUCTION: Guest Editor's introduction

NASA Astrophysics Data System (ADS)

Chrysanthis, Panos K.

1996-12-01

Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA This special issue focuses on current efforts to represent and support workflows that integrate information systems and human resources within a business or manufacturing enterprise. Workflows may also be viewed as an emerging computational paradigm for effective structuring of cooperative applications involving human users and access to diverse data types not necessarily maintained by traditional database management systems. A workflow is an automated organizational process (also called business process) which consists of a set of activities or tasks that need to be executed in a particular controlled order over a combination of heterogeneous database systems and legacy systems. Within workflows, tasks are performed cooperatively by either human or computational agents in accordance with their roles in the organizational hierarchy. The challenge in facilitating the implementation of workflows lies in developing efficient workflow management systems. A workflow management system (also called workflow server, workflow engine or workflow enactment system) provides the necessary interfaces for coordination and communication among human and computational agents to execute the tasks involved in a workflow and controls the execution orderings of tasks as well as the flow of data that these tasks manipulate. That is, the workflow management system is responsible for correctly and reliably supporting the specification, execution, and monitoring of workflows. The six papers selected (out of the twenty-seven submitted for this special issue of Distributed Systems Engineering) address different aspects of these three functional components of a workflow management system. In the first paper, `Correctness issues in workflow management', Kamath and Ramamritham discuss the important issue of correctness in workflow management that constitutes a prerequisite for the use of workflows in the automation of the critical organizational/business processes. In particular, this paper examines the issues of execution atomicity and failure atomicity, differentiating between correctness requirements of system failures and logical failures, and surveys techniques that can be used to ensure data consistency in workflow management systems. While the first paper is concerned with correctness assuming transactional workflows in which selective transactional properties are associated with individual tasks or the entire workflow, the second paper, `Scheduling workflows by enforcing intertask dependencies' by Attie et al, assumes that the tasks can be either transactions or other activities involving legacy systems. This second paper describes the modelling and specification of conditions involving events and dependencies among tasks within a workflow using temporal logic and finite state automata. It also presents a scheduling algorithm that enforces all stated dependencies by executing at any given time only those events that are allowed by all the dependency automata and in an order as specified by the dependencies. In any system with decentralized control, there is a need to effectively cope with the tension that exists between autonomy and consistency requirements. In `A three-level atomicity model for decentralized workflow management systems', Ben-Shaul and Heineman focus on the specific requirement of enforcing failure atomicity in decentralized, autonomous and interacting workflow management systems. Their paper describes a model in which each workflow manager must be able to specify the sequence of tasks that comprise an atomic unit for the purposes of correctness, and the degrees of local and global atomicity for the purpose of cooperation with other workflow managers. The paper also discusses a realization of this model in which treaties and summits provide an agreement mechanism, while underlying transaction managers are responsible for maintaining failure atomicity. The fourth and fifth papers are experience papers describing a workflow management system and a large scale workflow application, respectively. Schill and Mittasch, in `Workflow management systems on top of OSF DCE and OMG CORBA', describe a decentralized workflow management system and discuss its implementation using two standardized middleware platforms, namely, OSF DCE and OMG CORBA. The system supports a new approach to workflow management, introducing several new concepts such as data type management for integrating various types of data and quality of service for various services provided by servers. A problem common to both database applications and workflows is the handling of missing and incomplete information. This is particularly pervasive in an `electronic market' with a huge number of retail outlets producing and exchanging volumes of data, the application discussed in `Information flow in the DAMA project beyond database managers: information flow managers'. Motivated by the need for a method that allows a task to proceed in a timely manner if not all data produced by other tasks are available by its deadline, Russell et al propose an architectural framework and a language that can be used to detect, approximate and, later on, to adjust missing data if necessary. The final paper, `The evolution towards flexible workflow systems' by Nutt, is complementary to the other papers and is a survey of issues and of work related to both workflow and computer supported collaborative work (CSCW) areas. In particular, the paper provides a model and a categorization of the dimensions which workflow management and CSCW systems share. Besides summarizing the recent advancements towards efficient workflow management, the papers in this special issue suggest areas open to investigation and it is our hope that they will also provide the stimulus for further research and development in the area of workflow management systems.
Linking data repositories - an illustration of agile data curation principles through robust documentation and multiple application programming interfaces

NASA Astrophysics Data System (ADS)

Benedict, K. K.; Servilla, M. S.; Vanderbilt, K.; Wheeler, J.

2015-12-01

The growing volume, variety and velocity of production of Earth science data magnifies the impact of inefficiencies in data acquisition, processing, analysis, and sharing workflows, potentially to the point of impairing the ability of researchers to accomplish their desired scientific objectives. The adaptation of agile software development principles (http://agilemanifesto.org/principles.html) to data curation processes has significant potential to lower barriers to effective scientific data discovery and reuse - barriers that otherwise may force the development of new data to replace existing but unusable data, or require substantial effort to make data usable in new research contexts. This paper outlines a data curation process that was developed at the University of New Mexico that provides a cross-walk of data and associated documentation between the data archive developed by the Long Term Ecological Research (LTER) Network Office (PASTA - http://lno.lternet.edu/content/network-information-system) and UNM's institutional repository (LoboVault - http://repository.unm.edu). The developed automated workflow enables the replication of versioned data objects and their associated standards-based metadata between the LTER system and LoboVault - providing long-term preservation for those data/metadata packages within LoboVault while maintaining the value-added services that the PASTA platform provides. The relative ease with which this workflow was developed is a product of the capabilities independently developed on both platforms - including the simplicity of providing a well-documented application programming interface (API) for each platform enabling scripted interaction and the use of well-established documentation standards (EML in the case of PASTA, Dublin Core in the case of LoboVault) by both systems. These system characteristics, when combined with an iterative process of interaction between the Data Curation Librarian (on the LoboVault side of the process), the Sevilleta LTER Information Manager and the LTER Network Information System developer, yielded a rapid and relatively streamlined process for targeted replication of data and metadata between the two systems - increasing the discoverability and usability of the LTER data assets.
Genetic design automation: engineering fantasy or scientific renewal?

PubMed

Lux, Matthew W; Bramlett, Brian W; Ball, David A; Peccoud, Jean

2012-02-01

The aim of synthetic biology is to make genetic systems more amenable to engineering, which has naturally led to the development of computer-aided design (CAD) tools. Experimentalists still primarily rely on project-specific ad hoc workflows instead of domain-specific tools, which suggests that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. Copyright © 2011 Elsevier Ltd. All rights reserved.

Cloud services for the Fermilab scientific stakeholders

DOE PAGES

Timm, S.; Garzoglio, G.; Mhashilkar, P.; ...

2015-12-23

As part of the Fermilab/KISTI cooperative research project, Fermilab has successfully run an experimental simulation workflow at scale on a federation of Amazon Web Services (AWS), FermiCloud, and local FermiGrid resources. We used the CernVM-FS (CVMFS) file system to deliver the application software. We established Squid caching servers in AWS as well, using the Shoal system to let each individual virtual machine find the closest squid server. We also developed an automatic virtual machine conversion system so that we could transition virtual machines made on FermiCloud to Amazon Web Services. We used this system to successfully run a cosmic raymore » simulation of the NOvA detector at Fermilab, making use of both AWS spot pricing and network bandwidth discounts to minimize the cost. On FermiCloud we also were able to run the workflow at the scale of 1000 virtual machines, using a private network routable inside of Fermilab. As a result, we present in detail the technological improvements that were used to make this work a reality.« less
Cloud services for the Fermilab scientific stakeholders

DOE Office of Scientific and Technical Information (OSTI.GOV)

Timm, S.; Garzoglio, G.; Mhashilkar, P.

As part of the Fermilab/KISTI cooperative research project, Fermilab has successfully run an experimental simulation workflow at scale on a federation of Amazon Web Services (AWS), FermiCloud, and local FermiGrid resources. We used the CernVM-FS (CVMFS) file system to deliver the application software. We established Squid caching servers in AWS as well, using the Shoal system to let each individual virtual machine find the closest squid server. We also developed an automatic virtual machine conversion system so that we could transition virtual machines made on FermiCloud to Amazon Web Services. We used this system to successfully run a cosmic raymore » simulation of the NOvA detector at Fermilab, making use of both AWS spot pricing and network bandwidth discounts to minimize the cost. On FermiCloud we also were able to run the workflow at the scale of 1000 virtual machines, using a private network routable inside of Fermilab. As a result, we present in detail the technological improvements that were used to make this work a reality.« less
Grid workflow validation using ontology-based tacit knowledge: A case study for quantitative remote sensing applications

NASA Astrophysics Data System (ADS)

Liu, Jia; Liu, Longli; Xue, Yong; Dong, Jing; Hu, Yingcui; Hill, Richard; Guang, Jie; Li, Chi

2017-01-01

Workflow for remote sensing quantitative retrieval is the ;bridge; between Grid services and Grid-enabled application of remote sensing quantitative retrieval. Workflow averts low-level implementation details of the Grid and hence enables users to focus on higher levels of application. The workflow for remote sensing quantitative retrieval plays an important role in remote sensing Grid and Cloud computing services, which can support the modelling, construction and implementation of large-scale complicated applications of remote sensing science. The validation of workflow is important in order to support the large-scale sophisticated scientific computation processes with enhanced performance and to minimize potential waste of time and resources. To research the semantic correctness of user-defined workflows, in this paper, we propose a workflow validation method based on tacit knowledge research in the remote sensing domain. We first discuss the remote sensing model and metadata. Through detailed analysis, we then discuss the method of extracting the domain tacit knowledge and expressing the knowledge with ontology. Additionally, we construct the domain ontology with Protégé. Through our experimental study, we verify the validity of this method in two ways, namely data source consistency error validation and parameters matching error validation.
BPELPower—A BPEL execution engine for geospatial web services

NASA Astrophysics Data System (ADS)

Yu, Genong (Eugene); Zhao, Peisheng; Di, Liping; Chen, Aijun; Deng, Meixia; Bai, Yuqi

2012-10-01

The Business Process Execution Language (BPEL) has become a popular choice for orchestrating and executing workflows in the Web environment. As one special kind of scientific workflow, geospatial Web processing workflows are data-intensive, deal with complex structures in data and geographic features, and execute automatically with limited human intervention. To enable the proper execution and coordination of geospatial workflows, a specially enhanced BPEL execution engine is required. BPELPower was designed, developed, and implemented as a generic BPEL execution engine with enhancements for executing geospatial workflows. The enhancements are especially in its capabilities in handling Geography Markup Language (GML) and standard geospatial Web services, such as the Web Processing Service (WPS) and the Web Feature Service (WFS). BPELPower has been used in several demonstrations over the decade. Two scenarios were discussed in detail to demonstrate the capabilities of BPELPower. That study showed a standard-compliant, Web-based approach for properly supporting geospatial processing, with the only enhancement at the implementation level. Pattern-based evaluation and performance improvement of the engine are discussed: BPELPower directly supports 22 workflow control patterns and 17 workflow data patterns. In the future, the engine will be enhanced with high performance parallel processing and broad Web paradigms.
Developing a workflow to identify inconsistencies in volunteered geographic information: a phenological case study

USGS Publications Warehouse

Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.

2015-01-01

Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.
Research and Implementation of Key Technologies in Multi-Agent System to Support Distributed Workflow

NASA Astrophysics Data System (ADS)

Pan, Tianheng

2018-01-01

In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.
GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

PubMed Central

Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

2017-01-01

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067
GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

PubMed

Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

2017-01-01

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.
sbtools: A package connecting R to cloud-based data for collaborative online research

USGS Publications Warehouse

Winslow, Luke; Chamberlain, Scott; Appling, Alison P.; Read, Jordan S.

2016-01-01

The adoption of high-quality tools for collaboration and reproducible research such as R and Github is becoming more common in many research fields. While Github and other version management systems are excellent resources, they were originally designed to handle code and scale poorly to large text-based or binary datasets. A number of scientific data repositories are coming online and are often focused on dataset archival and publication. To handle collaborative workflows using large scientific datasets, there is increasing need to connect cloud-based online data storage to R. In this article, we describe how the new R package sbtools enables direct access to the advanced online data functionality provided by ScienceBase, the U.S. Geological Survey’s online scientific data storage platform.
From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions.

PubMed

Ruusmann, Villu; Maran, Uko

2013-07-01

The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.
Cloud-based opportunities in scientific computing: insights from processing Suomi National Polar-Orbiting Partnership (S-NPP) Direct Broadcast data

NASA Astrophysics Data System (ADS)

Evans, J. D.; Hao, W.; Chettri, S.

2013-12-01

The cloud is proving to be a uniquely promising platform for scientific computing. Our experience with processing satellite data using Amazon Web Services highlights several opportunities for enhanced performance, flexibility, and cost effectiveness in the cloud relative to traditional computing -- for example: - Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by quiet periods when the satellite is out of receiving range. In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are automating a data-driven approach to the management of cloud computing resources, in which new data availability triggers the creation of new virtual machines (of variable size and processing power) which last only until the processing workflow is complete. - 'Spot instances' are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- for software systems that are engineered to withstand unpredictable interruptions in service (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing workflows to resume with minimal delays after temporary spot price spikes. This will allow systems to take full advantage of variably-priced 'utility computing.' - Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by 'user data' containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing. - Finally, cloud computing is a promising platform for distributed volunteer ('interstitial') computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. Interstitial computing faces significant challenges as commodity computing shifts from (always on) desktop computers towards smartphones and tablets (untethered and running on scarce battery power); but cloud computing offers significant slack capacity. This capacity includes virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources (i.e., cloud-based interstitial computing) for satellite data processing and related analyses. We will present our findings and research directions on these and related topics.
The QuakeSim Project: Web Services for Managing Geophysical Data and Applications

NASA Astrophysics Data System (ADS)

Pierce, Marlon E.; Fox, Geoffrey C.; Aktas, Mehmet S.; Aydin, Galip; Gadgil, Harshawardhan; Qi, Zhigang; Sayar, Ahmet

2008-04-01

We describe our distributed systems research efforts to build the “cyberinfrastructure” components that constitute a geophysical Grid, or more accurately, a Grid of Grids. Service-oriented computing principles are used to build a distributed infrastructure of Web accessible components for accessing data and scientific applications. Our data services fall into two major categories: Archival, database-backed services based around Geographical Information System (GIS) standards from the Open Geospatial Consortium, and streaming services that can be used to filter and route real-time data sources such as Global Positioning System data streams. Execution support services include application execution management services and services for transferring remote files. These data and execution service families are bound together through metadata information and workflow services for service orchestration. Users may access the system through the QuakeSim scientific Web portal, which is built using a portlet component approach.
Radiology information system: a workflow-based approach.

PubMed

Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P

2009-09-01

Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.
Distinguishing Provenance Equivalence of Earth Science Data

NASA Technical Reports Server (NTRS)

Tilmes, Curt; Yesha, Ye; Halem, M.

2010-01-01

Reproducibility of scientific research relies on accurate and precise citation of data and the provenance of that data. Earth science data are often the result of applying complex data transformation and analysis workflows to vast quantities of data. Provenance information of data processing is used for a variety of purposes, including understanding the process and auditing as well as reproducibility. Certain provenance information is essential for producing scientifically equivalent data. Capturing and representing that provenance information and assigning identifiers suitable for precisely distinguishing data granules and datasets is needed for accurate comparisons. This paper discusses scientific equivalence and essential provenance for scientific reproducibility. We use the example of an operational earth science data processing system to illustrate the application of the technique of cascading digital signatures or hash chains to precisely identify sets of granules and as provenance equivalence identifiers to distinguish data made in an an equivalent manner.
MouseNet database: digital management of a large-scale mutagenesis project.

PubMed

Pargent, W; Heffner, S; Schäble, K F; Soewarto, D; Fuchs, H; Hrabé de Angelis, M

2000-07-01

The Munich ENU Mouse Mutagenesis Screen is a large-scale mutant production, phenotyping, and mapping project. It encompasses two animal breeding facilities and a number of screening groups located in the general area of Munich. A central database is required to manage and process the immense amount of data generated by the mutagenesis project. This database, which we named MouseNet(c), runs on a Sybase platform and will finally store and process all data from the entire project. In addition, the system comprises a portfolio of functions needed to support the workflow management of the core facility and the screening groups. MouseNet(c) will make all of the data available to the participating screening groups, and later to the international scientific community. MouseNet(c) will consist of three major software components:* Animal Management System (AMS)* Sample Tracking System (STS)* Result Documentation System (RDS)MouseNet(c) provides the following major advantages:* being accessible from different client platforms via the Internet* being a full-featured multi-user system (including access restriction and data locking mechanisms)* relying on a professional RDBMS (relational database management system) which runs on a UNIX server platform* supplying workflow functions and a variety of plausibility checks.
Automated Web-Based Request Mechanism for Workflow Enhancement in an Academic Customer-Focused Biorepository.

PubMed

McDonald, Sandra A; Ryan, Benjamin J; Brink, Amy; Holtschlag, Victoria L

2012-02-01

Informatics systems, particularly those that provide capabilities for data storage, specimen tracking, retrieval, and order fulfillment, are critical to the success of biorepositories and other laboratories engaged in translational medical research. A crucial item-one easily overlooked-is an efficient way to receive and process investigator-initiated requests. A successful electronic ordering system should allow request processing in a maximally efficient manner, while also allowing streamlined tracking and mining of request data such as turnaround times and numerical categorizations (user groups, funding sources, protocols, and so on). Ideally, an electronic ordering system also facilitates the initial contact between the laboratory and customers, while still allowing for downstream communications and other steps toward scientific partnerships. We describe here the recently established Web-based ordering system for the biorepository at Washington University Medical Center, along with its benefits for workflow, tracking, and customer service. Because of the system's numerous value-added impacts, we think our experience can serve as a good model for other customer-focused biorepositories, especially those currently using manual or non-Web-based request systems. Our lessons learned also apply to the informatics developers who serve such biobanks.
Automated Web-Based Request Mechanism for Workflow Enhancement in an Academic Customer-Focused Biorepository

PubMed Central

Ryan, Benjamin J.; Brink, Amy; Holtschlag, Victoria L.

2012-01-01

Informatics systems, particularly those that provide capabilities for data storage, specimen tracking, retrieval, and order fulfillment, are critical to the success of biorepositories and other laboratories engaged in translational medical research. A crucial item—one easily overlooked—is an efficient way to receive and process investigator-initiated requests. A successful electronic ordering system should allow request processing in a maximally efficient manner, while also allowing streamlined tracking and mining of request data such as turnaround times and numerical categorizations (user groups, funding sources, protocols, and so on). Ideally, an electronic ordering system also facilitates the initial contact between the laboratory and customers, while still allowing for downstream communications and other steps toward scientific partnerships. We describe here the recently established Web-based ordering system for the biorepository at Washington University Medical Center, along with its benefits for workflow, tracking, and customer service. Because of the system's numerous value-added impacts, we think our experience can serve as a good model for other customer-focused biorepositories, especially those currently using manual or non-Web–based request systems. Our lessons learned also apply to the informatics developers who serve such biobanks. PMID:23386921
High Energy Physics Exascale Requirements Review. An Office of Science review sponsored jointly by Advanced Scientific Computing Research and High Energy Physics, June 10-12, 2015, Bethesda, Maryland

DOE Office of Scientific and Technical Information (OSTI.GOV)

Habib, Salman; Roser, Robert; Gerber, Richard

The U.S. Department of Energy (DOE) Office of Science (SC) Offices of High Energy Physics (HEP) and Advanced Scientific Computing Research (ASCR) convened a programmatic Exascale Requirements Review on June 10–12, 2015, in Bethesda, Maryland. This report summarizes the findings, results, and recommendations derived from that meeting. The high-level findings and observations are as follows. Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude — and in some cases greatermore » — than that available currently. The growth rate of data produced by simulations is overwhelming the current ability of both facilities and researchers to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. Data rates and volumes from experimental facilities are also straining the current HEP infrastructure in its ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. A close integration of high-performance computing (HPC) simulation and data analysis will greatly aid in interpreting the results of HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. Long-range planning between HEP and ASCR will be required to meet HEP’s research needs. To best use ASCR HPC resources, the experimental HEP program needs (1) an established, long-term plan for access to ASCR computational and data resources, (2) the ability to map workflows to HPC resources, (3) the ability for ASCR facilities to accommodate workflows run by collaborations potentially comprising thousands of individual members, (4) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, (5) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.« less
Development of a user customizable imaging informatics-based intelligent workflow engine system to enhance rehabilitation clinical trials

NASA Astrophysics Data System (ADS)

Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent

2014-03-01

Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.
Ontology-Driven Discovery of Scientific Computational Entities

ERIC Educational Resources Information Center

Brazier, Pearl W.

2010-01-01

Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…

Diagnostic procedures for non-small-cell lung cancer (NSCLC): recommendations of the European Expert Group

PubMed Central

Dietel, Manfred; Bubendorf, Lukas; Dingemans, Anne-Marie C; Dooms, Christophe; Elmberger, Göran; García, Rosa Calero; Kerr, Keith M; Lim, Eric; López-Ríos, Fernando; Thunnissen, Erik; Van Schil, Paul E; von Laffert, Maximilian

2016-01-01

Background There is currently no Europe-wide consensus on the appropriate preanalytical measures and workflow to optimise procedures for tissue-based molecular testing of non-small-cell lung cancer (NSCLC). To address this, a group of lung cancer experts (see list of authors) convened to discuss and propose standard operating procedures (SOPs) for NSCLC. Methods Based on earlier meetings and scientific expertise on lung cancer, a multidisciplinary group meeting was aligned. The aim was to include all relevant aspects concerning NSCLC diagnosis. After careful consideration, the following topics were selected and each was reviewed by the experts: surgical resection and sampling; biopsy procedures for analysis; preanalytical and other variables affecting quality of tissue; tissue conservation; testing procedures for epidermal growth factor receptor, anaplastic lymphoma kinase and ROS proto-oncogene 1, receptor tyrosine kinase (ROS1) in lung tissue and cytological specimens; as well as standardised reporting and quality control (QC). Finally, an optimal workflow was described. Results Suggested optimal procedures and workflows are discussed in detail. The broad consensus was that the complex workflow presented can only be executed effectively by an interdisciplinary approach using a well-trained team. Conclusions To optimise diagnosis and treatment of patients with NSCLC, it is essential to establish SOPs that are adaptable to the local situation. In addition, a continuous QC system and a local multidisciplinary tumour-type-oriented board are essential. PMID:26530085
KNIME for reproducible cross-domain analysis of life science data.

PubMed

Fillbrunn, Alexander; Dietz, Christian; Pfeuffer, Julianus; Rahn, René; Landrum, Gregory A; Berthold, Michael R

2017-11-10

Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment progresses and are seldomly well documented, a fact that hinders the reproducibility of the experiment. Workflow systems such as KNIME Analytics Platform aim to solve these problems by providing a platform for connecting tools graphically and guaranteeing the same results on different operating systems. As an open source software, KNIME allows scientists and programmers to provide their own extensions to the scientific community. In this review paper we present selected extensions from the life sciences that simplify data exploration, analysis, and visualization and are interoperable due to KNIME's unified data model. Additionally, we name other workflow systems that are commonly used in the life sciences and highlight their similarities and differences to KNIME. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
AGILE/GRID Science Alert Monitoring System: The Workflow and the Crab Flare Case

NASA Astrophysics Data System (ADS)

Bulgarelli, A.; Trifoglio, M.; Gianotti, F.; Tavani, M.; Conforti, V.; Parmiggiani, N.

2013-10-01

During the first five years of the AGILE mission we have observed many gamma-ray transients of Galactic and extragalactic origin. A fast reaction to unexpected transient events is a crucial part of the AGILE monitoring program, because the follow-up of astrophysical transients is a key point for this space mission. We present the workflow and the software developed by the AGILE Team to perform the automatic analysis for the detection of gamma-ray transients. In addition, an App for iPhone will be released enabling the Team to access the monitoring system through mobile phones. In 2010 September the science alert monitoring system presented in this paper recorded a transient phenomena from the Crab Nebula, generating an automated alert sent via email and SMS two hours after the end of an AGILE satellite orbit, i.e. two hours after the Crab flare itself: for this discovery AGILE won the 2012 Bruno Rossi prize. The design of this alert system is maximized to reach the maximum speed, and in this, as in many other cases, AGILE has demonstrated that the reaction speed of the monitoring system is crucial for the scientific return of the mission.
Visualisation methods for large provenance collections in data-intensive collaborative platforms

NASA Astrophysics Data System (ADS)

Spinuso, Alessandro; Fligueira, Rosa; Atkinson, Malcolm; Gemuend, Andre

2016-04-01

This work investigates improving the methods of visually representing provenance information in the context of modern data-driven scientific research. It explores scenarios where data-intensive workflows systems are serving communities of researchers within collaborative environments, supporting the sharing of data and methods, and offering a variety of computation facilities, including HPC, HTC and Cloud. It focuses on the exploration of big-data visualization techniques aiming at producing comprehensive and interactive views on top of large and heterogeneous provenance data. The same approach is applicable to control-flow and data-flow workflows or to combinations of the two. This flexibility is achieved using the W3C-PROV recommendation as a reference model, especially its workflow oriented profiles such as D-PROV (Messier et al. 2013). Our implementation is based on the provenance records produced by the dispel4py data-intensive processing library (Filgueira et al. 2015). dispel4py is an open-source Python framework for describing abstract stream-based workflows for distributed data-intensive applications, developed during the VERCE project. dispel4py enables scientists to develop their scientific methods and applications on their laptop and then run them at scale on a wide range of e-Infrastructures (Cloud, Cluster, etc.) without making changes. Users can therefore focus on designing their workflows at an abstract level, describing actions, input and output streams, and how they are connected. The dispel4py system then maps these descriptions to the enactment platforms, such as MPI, Storm, multiprocessing. It provides a mechanism which allows users to determine the provenance information to be collected and to analyze it at runtime. For this work we consider alternative visualisation methods for provenance data, from infinite lists and localised interactive graphs, to radial-views. The latter technique has been positively explored in many fields, from text data visualisation to genomics and social networking analysis. Its adoption for provenance has been presented in literature (Borkin et al. 2013) in the context of parent-child relationships across processes, constructed from control-flow information. Computer graphics research has focused on the advantage of this radial distribution of interlinked information and on ways to improve the visual efficiency and tunability of such representations, like the Hierarchical Edge Bundles visualisation method, (Holten et al. 2006), which aims at reducing visual clutter of highly connected structures via the generation of bundles. Our approach explores the potential of the combination of these methods. It serves environments where the size of the provenance collection, coupled with the diversity of the infrastructures and the domain metadata, make the extrapolation of usage trends extremely challenging. Applications of such visualisation systems can engage groups of scientists, data providers and computational engineers, by serving visual snapshots that highlight relationships between an item and its connected processes. We will present examples of comprehensive views on the distribution of processing and data transfers during a workflow's execution in HPC, as well as cross workflows interactions and internal dynamics. The latter in the context of faceted searches on domain metadata values-range. These are obtained from the analysis of real provenance data generated by the processing of seismic traces performed through the VERCE platform.
Design and implementation of a secure workflow system based on PKI/PMI

NASA Astrophysics Data System (ADS)

Yan, Kai; Jiang, Chao-hui

2013-03-01

As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.
Generic worklist handler for workflow-enabled products

NASA Astrophysics Data System (ADS)

Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas

1999-07-01

Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.
A WorkFlow Engine Oriented Modeling System for Hydrologic Sciences

NASA Astrophysics Data System (ADS)

Lu, B.; Piasecki, M.

2009-12-01

In recent years the use of workflow engines for carrying out modeling and data analyses tasks has gained increased attention in the science and engineering communities. Tasks like processing raw data coming from sensors and passing these raw data streams to filters for QA/QC procedures possibly require multiple and complicated steps that need to be repeated over and over again. A workflow sequence that carries out a number of steps of various complexity is an ideal approach to deal with these tasks because the sequence can be stored, called up and repeated over again and again. This has several advantages: for one it ensures repeatability of processing steps and with that provenance, an issue that is increasingly important in the science and engineering communities. It also permits the hand off of lengthy and time consuming tasks that can be error prone to a chain of processing actions that are carried out automatically thus reducing the chance for error on the one side and freeing up time to carry out other tasks on the other hand. This paper aims to present the development of a workflow engine embedded modeling system which allows to build up working sequences for carrying out numerical modeling tasks regarding to hydrologic science. Trident, which facilitates creating, running and sharing scientific data analysis workflows, is taken as the central working engine of the modeling system. Current existing functionalities of the modeling system involve digital watershed processing, online data retrieval, hydrologic simulation and post-event analysis. They are stored as sequences or modules respectively. The sequences can be invoked to implement their preset tasks in orders, for example, triangulating a watershed from raw DEM. Whereas the modules encapsulated certain functions can be selected and connected through a GUI workboard to form sequences. This modeling system is demonstrated by setting up a new sequence for simulating rainfall-runoff processes which involves embedded Penn State Integrated Hydrologic Model(PIHM) module for hydrologic simulation as a kernel, DEM processing sub-sequence which prepares geospatial data for PIHM, data retrieval module which access time series data from online data repository via web services or from local database, post- data management module which stores , visualizes and analyzes model outputs.
SADI, SHARE, and the in silico scientific method

PubMed Central

2010-01-01

Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Scientific workflow and support for high resolution global climate modeling at the Oak Ridge Leadership Computing Facility

NASA Astrophysics Data System (ADS)

Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.

2012-04-01

The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on these systems, and developed as part of the Ultra-High Resolution Climate Modeling Project, allows users of OLCF resources to efficiently share simulated data, often multi-terabyte in volume, as well as the results from the modeling experiments and various synthesized products derived from these simulations. The final objective in the exercise is to ensure that the simulation results and the enhanced understanding will serve the needs of a diverse group of stakeholders across the world, including our research partners in U.S. Department of Energy laboratories & universities, domain scientists, students (K-12 as well as higher education), resource managers, decision makers, and the general public.
Reengineering observatory operations for the time domain

NASA Astrophysics Data System (ADS)

Seaman, Robert L.; Vestrand, W. T.; Hessman, Frederic V.

2014-07-01

Observatories are complex scientific and technical institutions serving diverse users and purposes. Their telescopes, instruments, software, and human resources engage in interwoven workflows over a broad range of timescales. These workflows have been tuned to be responsive to concepts of observatory operations that were applicable when various assets were commissioned, years or decades in the past. The astronomical community is entering an era of rapid change increasingly characterized by large time domain surveys, robotic telescopes and automated infrastructures, and - most significantly - of operating modes and scientific consortia that span our individual facilities, joining them into complex network entities. Observatories must adapt and numerous initiatives are in progress that focus on redesigning individual components out of the astronomical toolkit. New instrumentation is both more capable and more complex than ever, and even simple instruments may have powerful observation scripting capabilities. Remote and queue observing modes are now widespread. Data archives are becoming ubiquitous. Virtual observatory standards and protocols and astroinformatics data-mining techniques layered on these are areas of active development. Indeed, new large-aperture ground-based telescopes may be as expensive as space missions and have similarly formal project management processes and large data management requirements. This piecewise approach is not enough. Whatever challenges of funding or politics facing the national and international astronomical communities it will be more efficient - scientifically as well as in the usual figures of merit of cost, schedule, performance, and risks - to explicitly address the systems engineering of the astronomical community as a whole.
Deploying and sharing U-Compare workflows as web services.

PubMed

Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia

2013-02-18

U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services

PubMed Central

2013-01-01

Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017
Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials

PubMed Central

Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja

2015-01-01

Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437
Paperless Transaction for Publication Incentive System

NASA Astrophysics Data System (ADS)

Ibrahim, Rosziati; Madon, Hamiza Diana; Nazri, Nurul Hashida Amira Mohd; Saarani, Norhafizah; Mustapha, Aida

2017-08-01

Within the Malaysian context, incentive system in scientific publishing rewards authors for publishing journal articles or conference papers that are indexed by Scopus. At Universiti Tun Hussein Onn Malaysia, the incentive system is going into its third year in operational. The main challenge lies in preparing the evidences as required by the application guideline. This paper presents an online module for publication incentive within the University Publication Information System (SMPU). The module was developed using the Scrum methodology based on the existing workflow of paper-based application. The module is hoped to increase the quality of the system deliverables of SMPU as well as having the ability to cope with change of university requirements in the future.
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

PubMed

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

PubMed Central

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997
Progress in digital color workflow understanding in the International Color Consortium (ICC) Workflow WG

NASA Astrophysics Data System (ADS)

McCarthy, Ann

2006-01-01

The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.
A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

ERIC Educational Resources Information Center

Friedrich, Jon M.

2014-01-01

Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…
ASCR/HEP Exascale Requirements Review Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Habib, Salman; Roser, Robert; Gerber, Richard

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, tomore » store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.« less

ASCR/HEP Exascale Requirements Review Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Habib, Salman; et al.

2016-03-30

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, tomore » store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.« less
Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services.

PubMed

Palmblad, Magnus; Torvik, Vetle I

2017-01-01

Tropical medicine appeared as a distinct sub-discipline in the late nineteenth century, during a period of rapid European colonial expansion in Africa and Asia. After a dramatic drop after World War II, research on tropical diseases have received more attention and research funding in the twenty-first century. We used Apache Taverna to integrate Europe PMC and MapAffil web services, containing the spatiotemporal analysis workflow from a list of PubMed queries to a list of publication years and author affiliations geoparsed to latitudes and longitudes. The results could then be visualized in the Quantum Geographic Information System (QGIS). Our workflows automatically matched 253,277 affiliations to geographical coordinates for the first authors of 379,728 papers on tropical diseases in a single execution. The bibliometric analyses show how research output in tropical diseases follow major historical shifts in the twentieth century and renewed interest in and funding for tropical disease research in the twenty-first century. They show the effects of disease outbreaks, WHO eradication programs, vaccine developments, wars, refugee migrations, and peace treaties. Literature search and geoparsing web services can be combined in scientific workflows performing a complete spatiotemporal bibliometric analyses of research in tropical medicine. The workflows and datasets are freely available and can be used to reproduce or refine the analyses and test specific hypotheses or look into particular diseases or geographic regions. This work exceeds all previously published bibliometric analyses on tropical diseases in both scale and spatiotemporal range.
An Adaptable Seismic Data Format for Modern Scientific Workflows

NASA Astrophysics Data System (ADS)

Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.

2013-12-01

Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.
Pegasus Workflow Management System: Helping Applications From Earth and Space

NASA Astrophysics Data System (ADS)

Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.

2010-12-01

Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved. Astrophysics: The Laser Interferometer Gravitational-Wave Observatory (LIGO) uses Pegasus WMS to search for binary inspiral gravitational waves. A month of LIGO data requires many thousands of jobs, running for days on hundreds of CPUs on the LIGO Data Grid (LDG) and Open Science Grid (OSG). Ocean Temperature Forecast: Researchers at the Jet Propulsion Laboratory are exploring Pegasus WMS to run ocean forecast ensembles of the California coastal region. These models produce a number of daily forecasts for water temperature, salinity, and other measures. Helioseismology: The Solar Dynamics Observatory (SDO) is NASA's most important solar physics mission of this coming decade. Pegasus WMS is being used to analyze the data from SDO, which will be predominantly used to learn about solar magnetic activity and to probe the internal structure and dynamics of the Sun with helioseismology. Bacterial RNA studies: SIPHT is an application in bacterial genomics, which predicts sRNA (small non-coding RNAs)-encoding genes in bacteria. This project currently provides a web-based interface using Pegasus WMS at the backend to facilitate large-scale execution of the workflows on varied resources and provide better notifications of task/workflow completion.
Launching an EarthCube Interoperability Workbench for Constructing Workflows and Employing Service Interfaces

NASA Astrophysics Data System (ADS)

Fulker, D. W.; Pearlman, F.; Pearlman, J.; Arctur, D. K.; Signell, R. P.

2016-12-01

A major challenge for geoscientists—and a key motivation for the National Science Foundation's EarchCube initiative—is to integrate data across disciplines, as is necessary for complex Earth-system studies such as climate change. The attendant technical and social complexities have led EarthCube participants to devise a system-of-systems architectural concept. Its centerpiece is a (virtual) interoperability workbench, around which a learning community can coalesce, supported in their evolving quests to join data from diverse sources, to synthesize new forms of data depicting Earth phenomena, and to overcome immense obstacles that arise, for example, from mismatched nomenclatures, projections, mesh geometries and spatial-temporal scales. The full architectural concept will require significant time and resources to implement, but this presentation describes a (minimal) starter kit. With a keep-it-simple mantra this workbench starter kit can fulfill the following four objectives: 1) demonstrate the feasibility of an interoperability workbench by mid-2017; 2) showcase scientifically useful examples of cross-domain interoperability, drawn, e.g., from funded EarthCube projects; 3) highlight selected aspects of EarthCube's architectural concept, such as a system of systems (SoS) linked via service interfaces; 4) demonstrate how workflows can be designed and used in a manner that enables sharing, promotes collaboration and fosters learning. The outcome, despite its simplicity, will embody service interfaces sufficient to construct—from extant components—data-integration and data-synthesis workflows involving multiple geoscience domains. Tentatively, the starter kit will build on the Jupyter Notebook web application, augmented with libraries for interfacing current services (at data centers involved in EarthCube's Council of Data Facilities, e.g.) and services developed specifically for EarthCube and spanning most geoscience domains.
SWARM : a scientific workflow for supporting Bayesian approaches to improve metabolic models.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shi, X.; Stevens, R.; Mathematics and Computer Science

2008-01-01

With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM dealsmore » with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.« less
DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerber, Richard; Allcock, William; Beggio, Chris

2014-10-17

U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at themore » DOE national laboratories. The report contains findings from that review.« less
Singularity: Scientific containers for mobility of compute.

PubMed

Kurtzer, Gregory M; Sochat, Vanessa; Bauer, Michael W

2017-01-01

Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Singularity: Scientific containers for mobility of compute

PubMed Central

Kurtzer, Gregory M.; Bauer, Michael W.

2017-01-01

Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science. PMID:28494014
Experiences with Deriva: An Asset Management Platform for Accelerating eScience.

PubMed

Bugacov, Alejandro; Czajkowski, Karl; Kesselman, Carl; Kumar, Anoop; Schuler, Robert E; Tangmunarunkit, Hongsuda

2017-10-01

The pace of discovery in eScience is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. It is all too common for investigators to spend inordinate amounts of time developing ad hoc procedures to manage their data. In previous work, we presented Deriva, a Scientific Asset Management System, designed to accelerate data driven discovery. In this paper, we report on the use of Deriva in a number of substantial and diverse eScience applications. We describe the lessons we have learned, both from the perspective of the Deriva technology, as well as the ability and willingness of scientists to incorporate Scientific Asset Management into their daily workflows.
Reproducibility and Knowledge Capture Architecture for the NASA Earth Exchange (NEX)

NASA Astrophysics Data System (ADS)

Votava, P.; Michaelis, A.; Nemani, R. R.

2015-12-01

NASA Earth Exchange (NEX) is a data, supercomputing and knowledge collaboratory that houses NASA satellite, climate and ancillary data where a focused community can come together to address large-scale challenges in Earth sciences. As NEX has been growing into a platform for analysis, experiments and production of data on the order of petabytes in volume, it has been increasingly important to enable users to easily retrace their steps, identify what datasets were produced by which process or chain of processes, and give them ability to readily reproduce their results. This can be a tedious and difficult task even for a small project, but is almost impossible on large processing pipelines. For example, the NEX Landsat pipeline is deployed to process hundreds of thousands of Landsat scenes in a non-linear production workflow with many-to-many mappings of files between 40 separate processing stages where over 100 million processes get executed. At this scale it is almost impossible to easily examine the entire provenance of each file, let alone easily reproduce it. We have developed an initial solution for the NEX system - a transparent knowledge capture and reproducibility architecture that does not require any special code instrumentation and other actions on user's part. Users can automatically capture their work through a transparent provenance tracking system and the information can subsequently be queried and/or converted into workflows. The provenance information is streamed to a MongoDB document store and a subset is converted to an RDF format and inserted into our triple-store. The triple-store already contains semantic information about other aspects of the NEX system and adding provenance enhances the ability to relate workflows and data to users, locations, projects and other NEX concepts that can be queried in a standard way. The provenance system has the ability to track data throughout NEX and across number of executions and can recreate and re-execute the entire history and reproduce the results. The information can also be used to automatically create individual workflow components and full workflows that can be visually examined, modified, executed and extended by researchers. This provides a key component for accelerating research through knowledge capture and scientific reproducibility on NEX.
Development of a novel imaging informatics-based system with an intelligent workflow engine (IWEIS) to support imaging-based clinical trials

PubMed Central

Wang, Ximing; Liu, Brent J; Martinez, Clarisa; Zhang, Xuejun; Winstein, Carolee J

2015-01-01

Imaging based clinical trials can benefit from a solution to efficiently collect, analyze, and distribute multimedia data at various stages within the workflow. Currently, the data management needs of these trials are typically addressed with custom-built systems. However, software development of the custom- built systems for versatile workflows can be resource-consuming. To address these challenges, we present a system with a workflow engine for imaging based clinical trials. The system enables a project coordinator to build a data collection and management system specifically related to study protocol workflow without programming. Web Access to DICOM Objects (WADO) module with novel features is integrated to further facilitate imaging related study. The system was initially evaluated by an imaging based rehabilitation clinical trial. The evaluation shows that the cost of the development of system can be much reduced compared to the custom-built system. By providing a solution to customize a system and automate the workflow, the system will save on development time and reduce errors especially for imaging clinical trials. PMID:25870169
Facilitating Stewardship of scientific data through standards based workflows

NASA Astrophysics Data System (ADS)

Bastrakova, I.; Kemp, C.; Potter, A. K.

2013-12-01

There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
Resilient workflows for computational mechanics platforms

NASA Astrophysics Data System (ADS)

Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

2010-06-01

Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].
Worklist handling in workflow-enabled radiological application systems

NASA Astrophysics Data System (ADS)

Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens

2000-05-01

For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.

PubMed

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2014-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Abstracted Workow Framework with a Structure from Motion Application

NASA Astrophysics Data System (ADS)

Rossi, Adam J.

In scientific and engineering disciplines, from academia to industry, there is an increasing need for the development of custom software to perform experiments, construct systems, and develop products. The natural mindset initially is to shortcut and bypass all overhead and process rigor in order to obtain an immediate result for the problem at hand, with the misconception that the software will simply be thrown away at the end. In a majority of the cases, it turns out the software persists for many years, and likely ends up in production systems for which it was not initially intended. In the current study, a framework that can be used in both industry and academic applications mitigates underlying problems associated with developing scientific and engineering software. This results in software that is much more maintainable, documented, and usable by others, specifically allowing new users to extend capabilities of components already implemented in the framework. There is a multi-disciplinary need in the fields of imaging science, computer science, and software engineering for a unified implementation model, which motivates the development of an abstracted software framework. Structure from motion (SfM) has been identified as one use case where the abstracted workflow framework can improve research efficiencies and eliminate implementation redundancies in scientific fields. The SfM process begins by obtaining 2D images of a scene from different perspectives. Features from the images are extracted and correspondences are established. This provides a sufficient amount of information to initialize the problem for fully automated processing. Transformations are established between views, and 3D points are established via triangulation algorithms. The parameters for the camera models for all views / images are solved through bundle adjustment, establishing a highly consistent point cloud. The initial sparse point cloud and camera matrices are used to generate a dense point cloud through patch based techniques or densification algorithms such as Semi-Global Matching (SGM). The point cloud can be visualized or exploited by both humans and automated techniques. In some cases the point cloud is "draped" with original imagery in order to enhance the 3D model for a human viewer. The SfM workflow can be implemented in the abstracted framework, making it easily leverageable and extensible by multiple users. Like many processes in scientific and engineering domains, the workflow described for SfM is complex and requires many disparate components to form a functional system, often utilizing algorithms implemented by many users in different languages / environments and without knowledge of how the component fits into the larger system. In practice, this generally leads to issues interfacing the components, building the software for desired platforms, understanding its concept of operations, and how it can be manipulated in order to fit the desired function for a particular application. In addition, other scientists and engineers instinctively wish to analyze the performance of the system, establish new algorithms, optimize existing processes, and establish new functionality based on current research. This requires a framework whereby new components can be easily plugged in without affecting the current implemented functionality. The need for a universal programming environment establishes the motivation for the development of the abstracted workflow framework. This software implementation, named Catena, provides base classes from which new components must derive in order to operate within the framework. The derivation mandates requirements be satisfied in order to provide a complete implementation. Additionally, the developer must provide documentation of the component in terms of its overall function and inputs. The interface input and output values corresponding to the component must be defined in terms of their respective data types, and the implementation uses mechanisms within the framework to retrieve and send the values. This process requires the developer to componentize their algorithm rather than implement it monolithically. Although the requirements of the developer are slightly greater, the benefits realized from using Catena far outweigh the overhead, and results in extensible software. This thesis provides a basis for the abstracted workflow framework concept and the Catena software implementation. The benefits are also illustrated using a detailed examination of the SfM process as an example application.
Enabling a new Paradigm to Address Big Data and Open Science Challenges

NASA Astrophysics Data System (ADS)

Ramamurthy, Mohan; Fisher, Ward

2017-04-01

Data are not only the lifeblood of the geosciences but they have become the currency of the modern world in science and society. Rapid advances in computing, communi¬cations, and observational technologies — along with concomitant advances in high-resolution modeling, ensemble and coupled-systems predictions of the Earth system — are revolutionizing nearly every aspect of our field. Modern data volumes from high-resolution ensemble prediction/projection/simulation systems and next-generation remote-sensing systems like hyper-spectral satellite sensors and phased-array radars are staggering. For example, CMIP efforts alone will generate many petabytes of climate projection data for use in assessments of climate change. And NOAA's National Climatic Data Center projects that it will archive over 350 petabytes by 2030. For researchers and educators, this deluge and the increasing complexity of data brings challenges along with the opportunities for discovery and scientific breakthroughs. The potential for big data to transform the geosciences is enormous, but realizing the next frontier depends on effectively managing, analyzing, and exploiting these heterogeneous data sources, extracting knowledge and useful information from heterogeneous data sources in ways that were previously impossible, to enable discoveries and gain new insights. At the same time, there is a growing focus on the area of "Reproducibility or Replicability in Science" that has implications for Open Science. The advent of cloud computing has opened new avenues for not only addressing both big data and Open Science challenges to accelerate scientific discoveries. However, to successfully leverage the enormous potential of cloud technologies, it will require the data providers and the scientific communities to develop new paradigms to enable next-generation workflows and transform the conduct of science. Making data readily available is a necessary but not a sufficient condition. Data providers also need to give scientists an ecosystem that includes data, tools, workflows and other services needed to perform analytics, integration, interpretation, and synthesis - all in the same environment or platform. Instead of moving data to processing systems near users, as is the tradition, the cloud permits one to bring processing, computing, analysis and visualization to data - so called data proximate workbench capabilities, also known as server-side processing. In this talk, I will present the ongoing work at Unidata to facilitate a new paradigm for doing science by offering a suite of tools, resources, and platforms to leverage cloud technologies for addressing both big data and Open Science/reproducibility challenges. That work includes the development and deployment of new protocols for data access and server-side operations and Docker container images of key applications, JupyterHub Python notebook tools, and cloud-based analysis and visualization capability via the CloudIDV tool to enable reproducible workflows and effectively use the accessed data.
PACS-Graz, 1985-2000: from a scientific pilot to a state-wide multimedia radiological information system

NASA Astrophysics Data System (ADS)

Gell, Guenther

2000-05-01

1971/72 began the implementation of a computerized radiological documentation system as the Department of Radiology of the University of Graz, which developed over the years into a full RIS. 1985 started a scientific cooperation with SIEMENS to develop a PACS. The two systems were linked and evolved into a highly integrated RIS-PACS for the state wide hospital system in Styria. During its lifetime the RIS, originally implemented in FORTRAN on a UNIVAC 494 mainframe migrated to a PDP15, on to a PDP11, then VAX and Alphas. The flexible original record structure with variable length fields and the powerful retrieval language were retained. The data acquisition part with the user interface was rewritten several times and many service programs have been added. During our PACS cooperation many ideas like the folder concept or functionalities of the GUI have been designed and tested and were then implemented in the SIENET product. The actual RIS/PACS supports the whole workflow in the Radiology Department. It is installed in a 2.300 bed university hospital and the smaller hospitals of the State of Styria. Modalities from different vendors are connected via DICOM to the RIS (modality worklist) and to the PACS. PACSubsystems from other vendors have been integrated. Images are distributed to referring clinics and for teleconsultation and image processing and reports are available on line to all connected hospitals. We spent great efforts to guarantee optimal support of the workflow and to ensure an enhanced cost/benefit ratio for each user (class). Another special feature is selective image distribution. Using the high level retrieval language individual filters can be constructed easily to implement any image distribution policy agreed upon by radiologists and referring clinicians.
EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow.

PubMed

Magalhães, Wagner C S; Araujo, Nathalia M; Leal, Thiago P; Araujo, Gilderlanio S; Viriato, Paula J S; Kehdy, Fernanda S; Costa, Gustavo N; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Tarazona-Santos, Eduardo; Rodrigues, Maíra R

2018-06-14

EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs ( info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources. © 2018 Magalhães et al.; Published by Cold Spring Harbor Laboratory Press.

The Kiel data management infrastructure - arising from a generic data model

NASA Astrophysics Data System (ADS)

Fleischer, D.; Mehrtens, H.; Schirnick, C.; Springer, P.

2010-12-01

The Kiel Data Management Infrastructure (KDMI) started from a cooperation of three large-scale projects (SFB574, SFB754 and Cluster of Excellence The Future Ocean) and the Leibniz Institute of Marine Sciences (IFM-GEOMAR). The common strategy for project data management is a single person collecting and transforming data according to the requirements of the targeted data center(s). The intention of the KDMI cooperation is to avoid redundant and potentially incompatible data management efforts for scientists and data managers and to create a single sustainable infrastructure. An increased level of complexity in the conceptual planing arose from the diversity of marine disciplines and approximately 1000 scientists involved. KDMI key features focus on the data provenance which we consider to comprise the entire workflow from field sampling thru labwork to data calculation and evaluation. Managing the data of each individual project participant in this way yields the data management for the entire project and warrants the reusability of (meta)data. Accordingly scientists provide a workflow definition of their data creation procedures resulting in their target variables. The central idea in the development of the KDMI presented here is based on the object oriented programming concept which allows to have one object definition (workflow) and infinite numbers of object instances (data). Each definition is created by a graphical user interface and produces XML output stored in a database using a generic data model. On creation of a data instance the KDMI translates the definition into web forms for the scientist, the generic data model then accepts all information input following the given data provenance definition. An important aspect of the implementation phase is the possibility of a successive transition from daily measurement routines resulting in single spreadsheet files with well known points of failure and limited reuseability to a central infrastructure as a single point of truth. The data provenance approach has the following positive side effects: (1) the scientist designs the extend and timing of data and metadata prompts by workflow definitions himself while (2) consistency and completeness (mandatory information) of metadata in the resulting XML document can be checked by XML validation. (3) Storage of the entire data creation process (including raw data and processing steps) provides a multidimensional quality history accessible by all researchers in addition to the commonly applied one dimensional quality flag system. (4) The KDMI can be extended to other scientific disciplines by adding new workflows and domain specific outputs assisted by the KDMI-Team. The KDMI is a social network inspired system but instead of sharing privacy it is a sharing platform for daily scientific work, data and their provenance.
Modelling and analysis of workflow for lean supply chains

NASA Astrophysics Data System (ADS)

Ma, Jinping; Wang, Kanliang; Xu, Lida

2011-11-01

Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.
Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

NASA Astrophysics Data System (ADS)

Nguyen, L.; Chee, T.; Minnis, P.

2013-12-01

Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.
Biowep: a workflow enactment portal for bioinformatics applications.

PubMed

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-03-08

The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications

PubMed Central

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-01-01

Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
The standard-based open workflow system in GeoBrain (Invited)

NASA Astrophysics Data System (ADS)

Di, L.; Yu, G.; Zhao, P.; Deng, M.

2013-12-01

GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial intelligence technology. The GeoBrain workflow system has been used in multiple Earth science applications, including the monitoring of global agricultural drought, the assessment of flood damage, the derivation of national crop condition and progress information, and the detection of nuclear proliferation facilities and events.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.

PubMed

Cole, Brian S; Moore, Jason H

2018-03-01

Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing

PubMed Central

Moore, Jason H.

2018-01-01

Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation models

NASA Astrophysics Data System (ADS)

Fischer, T.; Naumov, D.; Sattler, S.; Kolditz, O.; Walther, M.

2015-11-01

We offer a versatile workflow to convert geological models built with the ParadigmTM GOCAD© (Geological Object Computer Aided Design) software into the open-source VTU (Visualization Toolkit unstructured grid) format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed as a way of communication between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of environmental studies. With two application examples in the Thuringian Syncline, we show how a heterogeneous geological GOCAD model including multiple layers and faults can be used for numerical groundwater flow modeling, in our case employing the OpenGeoSys open-source numerical toolbox for groundwater flow simulations. The presented workflow offers the chance to incorporate increasingly detailed data, utilizing the growing availability of computational power to simulate numerical models.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms

PubMed Central

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2017-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
The Nasa-Isro SAR Mission Science Data Products and Processing Workflows

NASA Astrophysics Data System (ADS)

Rosen, P. A.; Agram, P. S.; Lavalle, M.; Cohen, J.; Buckley, S.; Kumar, R.; Misra-Ray, A.; Ramanujam, V.; Agarwal, K. M.

2017-12-01

The NASA-ISRO SAR (NISAR) Mission is currently in the development phase and in the process of specifying its suite of data products and algorithmic workflows, responding to inputs from the NISAR Science and Applications Team. NISAR will provide raw data (Level 0), full-resolution complex imagery (Level 1), and interferometric and polarimetric image products (Level 2) for the entire data set, in both natural radar and geocoded coordinates. NASA and ISRO are coordinating the formats, meta-data layers, and algorithms for these products, for both the NASA-provided L-band radar and the ISRO-provided S-band radar. Higher level products will be also be generated for the purpose of calibration and validation, over large areas of Earth, including tectonic plate boundaries, ice sheets and sea-ice, and areas of ecosystem disturbance and change. This level of comprehensive product generation has been unprecedented for SAR missions in the past, and leads to storage processing challenges for the production system and the archive center. Further, recognizing the potential to support applications that require low latency product generation and delivery, the NISAR team is optimizing the entire end-to-end ground data system for such response, including exploring the advantages of cloud-based processing, algorithmic acceleration using GPUs, and on-demand processing schemes that minimize computational and transport costs, but allow rapid delivery to science and applications users. This paper will review the current products, workflows, and discuss the scientific and operational trade-space of mission capabilities.
A framework for streamlining research workflow in neuroscience and psychology

PubMed Central

Kubilius, Jonas

2014-01-01

Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691
A very simple, re-executable neuroimaging publication

PubMed Central

Ghosh, Satrajit S.; Poline, Jean-Baptiste; Keator, David B.; Halchenko, Yaroslav O.; Thomas, Adam G.; Kessler, Daniel A.; Kennedy, David N.

2017-01-01

Reproducible research is a key element of the scientific process. Re-executability of neuroimaging workflows that lead to the conclusions arrived at in the literature has not yet been sufficiently addressed and adopted by the neuroimaging community. In this paper, we document a set of procedures, which include supplemental additions to a manuscript, that unambiguously define the data, workflow, execution environment and results of a neuroimaging analysis, in order to generate a verifiable re-executable publication. Re-executability provides a starting point for examination of the generalizability and reproducibility of a given finding. PMID:28781753
Toward a geoinformatics framework for understanding the social and biophysical influences on urban nutrient pollution due to residential impervious service connectivity

NASA Astrophysics Data System (ADS)

Miles, B.; Band, L. E.

2012-12-01

Water sustainability has been recognized as a fundamental problem of science whose solution relies in part on high-performance computing. Stormwater management is a major concern of urban sustainability. Understanding interactions between urban landcover and stormwater nutrient pollution requires consideration of fine-scale residential stormwater management, which in turn requires high-resolution LIDAR and landcover data not provided through national spatial data infrastructure, as well as field observation at the household scale. The objectives of my research are twofold: (1) advance understanding of the relationship between residential stormwater management practices and the export of nutrient pollution from stormwater in urbanized ecosystems; and (2) improve the informatics workflows used in community ecohydrology modeling as applied to heterogeneous urbanized ecosystems. In support of these objectives, I present preliminary results from initial work to: (1) develop an ecohydrology workflow platform that automates data preparation while maintaining data provenance and model metadata to yield reproducible workflows and support model benchmarking; (2) perform field observation of existing patterns of residential rooftop impervious surface connectivity to stormwater networks; and (3) develop Regional Hydro-Ecological Simulation System (RHESSys) models for watersheds in Baltimore, MD (as part of the Baltimore Ecosystem Study (BES) NSF Long-Term Ecological Research (LTER) site) and Durham, NC (as part of the NSF Urban Long-Term Research Area (ULTRA) program); these models will be used to simulate nitrogen loading resulting from both baseline residential rooftop impervious connectivity and for disconnection scenarios (e.g. roof drainage to lawn v. engineered rain garden, upslope v. riparian). This research builds on work done as part of the NSF EarthCube Layered Architecture Concept Award where a RHESSys workflow is being implemented in an iRODS (integrated Rule-Oriented Data System) environment. Modeling the ecohydrology of urban ecosystems in a reliable and reproducible manner requires a flexible scientific workflow platform that allows rapid prototyping with large-scale spatial datasets and model refinement integrating expert knowledge with local datasets and household surveys.
MIMI: multimodality, multiresource, information integration environment for biomedical core facilities.

PubMed

Szymanski, Jacek; Wilson, David L; Zhang, Guo-Qiang

2009-10-01

The rapid expansion of biomedical research has brought substantial scientific and administrative data management challenges to modern core facilities. Scientifically, a core facility must be able to manage experimental workflow and the corresponding set of large and complex scientific data. It must also disseminate experimental data to relevant researchers in a secure and expedient manner that facilitates collaboration and provides support for data interpretation and analysis. Administratively, a core facility must be able to manage the scheduling of its equipment and to maintain a flexible and effective billing system to track material, resource, and personnel costs and charge for services to sustain its operation. It must also have the ability to regularly monitor the usage and performance of its equipment and to provide summary statistics on resources spent on different categories of research. To address these informatics challenges, we introduce a comprehensive system called MIMI (multimodality, multiresource, information integration environment) that integrates the administrative and scientific support of a core facility into a single web-based environment. We report the design, development, and deployment experience of a baseline MIMI system at an imaging core facility and discuss the general applicability of such a system in other types of core facilities. These initial results suggest that MIMI will be a unique, cost-effective approach to addressing the informatics infrastructure needs of core facilities and similar research laboratories.
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

DOE PAGES

Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...

2017-09-29

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
Workflow and Electronic Health Records in Small Medical Practices

PubMed Central

Ramaiah, Mala; Subrahmanian, Eswaran; Sriram, Ram D; Lide, Bettijoyce B

2012-01-01

This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities. PMID:22737096
The impact of computerized provider order entry systems on inpatient clinical workflow: a literature review.

PubMed

Niazkhani, Zahra; Pirnejad, Habibollah; Berg, Marc; Aarts, Jos

2009-01-01

Previous studies have shown the importance of workflow issues in the implementation of CPOE systems and patient safety practices. To understand the impact of CPOE on clinical workflow, we developed a conceptual framework and conducted a literature search for CPOE evaluations between 1990 and June 2007. Fifty-one publications were identified that disclosed mixed effects of CPOE systems. Among the frequently reported workflow advantages were the legible orders, remote accessibility of the systems, and the shorter order turnaround times. Among the frequently reported disadvantages were the time-consuming and problematic user-system interactions, and the enforcement of a predefined relationship between clinical tasks and between providers. Regarding the diversity of findings in the literature, we conclude that more multi-method research is needed to explore CPOE's multidimensional and collective impact on especially collaborative workflow.
Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.

PubMed

Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher

2012-12-01

As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.

Bridging experiment and theory: A template for unifying NMR data and electronic structure calculations

DOE PAGES

Brown, David M. L.; Cho, Herman; de Jong, Wibe A.

2016-02-09

Here, the testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashionmore » would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.« less
Bridging experiment and theory: A template for unifying NMR data and electronic structure calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, David M. L.; Cho, Herman; de Jong, Wibe A.

Here, the testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashionmore » would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.« less
A Model of Workflow Composition for Emergency Management

NASA Astrophysics Data System (ADS)

Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu

The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
Automated reporting of pharmacokinetic study results: gaining efficiency downstream from the laboratory.

PubMed

Schaefer, Peter

2011-07-01

The purpose of bioanalysis in the pharmaceutical industry is to provide 'raw' data about the concentration of a drug candidate and its metabolites as input for studies of drug properties such as pharmacokinetic (PK), toxicokinetic, bioavailability/bioequivalence and other studies. Building a seamless workflow from the laboratory to final reports is an ongoing challenge for IT groups and users alike. In such a workflow, PK automation can provide companies with the means to vastly increase the productivity of their scientific staff while improving the quality and consistency of their reports on PK analyses. This report presents the concept and benefits of PK automation and discuss which features of an automated reporting workflow should be translated into software requirements that pharmaceutical companies can use to select or build an efficient and effective PK automation solution that best meets their needs.
Data management routines for reproducible research using the G-Node Python Client library

PubMed Central

Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J.; Garbers, Christian; Rautenberg, Philipp L.; Wachtler, Thomas

2014-01-01

Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow. PMID:24634654
Data management routines for reproducible research using the G-Node Python Client library.

PubMed

Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J; Garbers, Christian; Rautenberg, Philipp L; Wachtler, Thomas

2014-01-01

Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow.
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

PubMed Central

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.

PubMed

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Integrating text mining into the MGI biocuration workflow

PubMed Central

Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.

2009-01-01

A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492
Integrating text mining into the MGI biocuration workflow.

PubMed

Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

2009-01-01

A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
Developing a Collection of Composable Data Translation Software Units to Improve Efficiency and Reproducibility in Ecohydrologic Modeling Workflows

NASA Astrophysics Data System (ADS)

Olschanowsky, C.; Flores, A. N.; FitzGerald, K.; Masarik, M. T.; Rudisill, W. J.; Aguayo, M.

2017-12-01

Dynamic models of the spatiotemporal evolution of water, energy, and nutrient cycling are important tools to assess impacts of climate and other environmental changes on ecohydrologic systems. These models require spatiotemporally varying environmental forcings like precipitation, temperature, humidity, windspeed, and solar radiation. These input data originate from a variety of sources, including global and regional weather and climate models, global and regional reanalysis products, and geostatistically interpolated surface observations. Data translation measures, often subsetting in space and/or time and transforming and converting variable units, represent a seemingly mundane, but critical step in the application workflows. Translation steps can introduce errors, misrepresentations of data, slow execution time, and interrupt data provenance. We leverage a workflow that subsets a large regional dataset derived from the Weather Research and Forecasting (WRF) model and prepares inputs to the Parflow integrated hydrologic model to demonstrate the impact translation tool software quality on scientific workflow results and performance. We propose that such workflows will benefit from a community approved collection of data transformation components. The components should be self-contained composable units of code. This design pattern enables automated parallelization and software verification, improving performance and reliability. Ensuring that individual translation components are self-contained and target minute tasks increases reliability. The small code size of each component enables effective unit and regression testing. The components can be automatically composed for efficient execution. An efficient data translation framework should be written to minimize data movement. Composing components within a single streaming process reduces data movement. Each component will typically have a low arithmetic intensity, meaning that it requires about the same number of bytes to be read as the number of computations it performs. When several components' executions are coordinated the overall arithmetic intensity increases, leading to increased efficiency.
A three-level atomicity model for decentralized workflow management systems

NASA Astrophysics Data System (ADS)

Ben-Shaul, Israel Z.; Heineman, George T.

1996-12-01

A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.
SYRMEP Tomo Project: a graphical user interface for customizing CT reconstruction workflows.

PubMed

Brun, Francesco; Massimi, Lorenzo; Fratini, Michela; Dreossi, Diego; Billé, Fulvio; Accardo, Agostino; Pugliese, Roberto; Cedola, Alessia

2017-01-01

When considering the acquisition of experimental synchrotron radiation (SR) X-ray CT data, the reconstruction workflow cannot be limited to the essential computational steps of flat fielding and filtered back projection (FBP). More refined image processing is often required, usually to compensate artifacts and enhance the quality of the reconstructed images. In principle, it would be desirable to optimize the reconstruction workflow at the facility during the experiment (beamtime). However, several practical factors affect the image reconstruction part of the experiment and users are likely to conclude the beamtime with sub-optimal reconstructed images. Through an example of application, this article presents SYRMEP Tomo Project (STP), an open-source software tool conceived to let users design custom CT reconstruction workflows. STP has been designed for post-beamtime (off-line use) and for a new reconstruction of past archived data at user's home institution where simple computing resources are available. Releases of the software can be downloaded at the Elettra Scientific Computing group GitHub repository https://github.com/ElettraSciComp/STP-Gui.
NASA's Hybrid Reality Lab: One Giant Leap for Full Dive

NASA Technical Reports Server (NTRS)

Delgado, Francisco J.; Noyes, Matthew

2017-01-01

This presentation demonstrates how NASA is using consumer VR headsets, game engine technology and NVIDIA's GPUs to create highly immersive future training systems augmented with extremely realistic haptic feedback, sound, additional sensory information, and how these can be used to improve the engineering workflow. Include in this presentation is an environment simulation of the ISS, where users can interact with virtual objects, handrails, and tracked physical objects while inside VR, integration of consumer VR headsets with the Active Response Gravity Offload System, and a space habitat architectural evaluation tool. Attendees will learn how the best elements of real and virtual worlds can be combined into a hybrid reality environment with tangible engineering and scientific applications.
Flexible End2End Workflow Automation of Hit-Discovery Research.

PubMed

Holzmüller-Laue, Silke; Göde, Bernd; Thurow, Kerstin

2014-08-01

The article considers a new approach of more complex laboratory automation at the workflow layer. The authors purpose the automation of end2end workflows. The combination of all relevant subprocesses-whether automated or manually performed, independently, and in which organizational unit-results in end2end processes that include all result dependencies. The end2end approach focuses on not only the classical experiments in synthesis or screening, but also on auxiliary processes such as the production and storage of chemicals, cell culturing, and maintenance as well as preparatory activities and analyses of experiments. Furthermore, the connection of control flow and data flow in the same process model leads to reducing of effort of the data transfer between the involved systems, including the necessary data transformations. This end2end laboratory automation can be realized effectively with the modern methods of business process management (BPM). This approach is based on a new standardization of the process-modeling notation Business Process Model and Notation 2.0. In drug discovery, several scientific disciplines act together with manifold modern methods, technologies, and a wide range of automated instruments for the discovery and design of target-based drugs. The article discusses the novel BPM-based automation concept with an implemented example of a high-throughput screening of previously synthesized compound libraries. © 2014 Society for Laboratory Automation and Screening.
Application Examples for Handle System Usage

NASA Astrophysics Data System (ADS)

Toussaint, F.; Weigel, T.; Thiemann, H.; Höck, H.; Stockhause, M.; Lautenschlager, M.

2012-12-01

Besides the well-known DOI (Digital Object Identifiers) as a special form of Handles that resolve to scientific publications there are various other applications in use. Others perhaps are just not yet. We present some examples for the existing ones and some ideas for the future. The national German project C3-Grid provides a framework to implement a first solution for provenance tracing and explore unforeseen implications. Though project-specific, the high-level architecture is generic and represents well a common notion of data derivation. Users select one or many input datasets and a workflow software module (an agent in this context) to execute on the data. The output data is deposited in a repository to be delivered to the user. All data is accompanied by an XML metadata document. All input and output data, metadata and the workflow module receive Handles and are linked together to establish a directed acyclic graph of derived data objects and involved agents. Data that has been modified by a workflow module is linked to its predecessor data and the workflow module involved. Version control systems such as svn or git provide Internet access to software repositories using URLs. To refer to a specific state of the source code of for instance a C3 workflow module, it is sufficient to reference the URL to the svn revision or git hash. In consequence, individual revisions and the repository as a whole receive PIDs. Moreover, the revision specific PIDs are linked to their respective predecessors and become part of the provenance graph. Another example for usage of PIDs in a current major project is given in EUDAT (European Data Infrastructure) which will link scientific data of several research communities together. In many fields it is necessary to provide data objects at multiple locations for a variety of applications. To ensure consistency, not only the master of a data object but also its copies shall be provided with a PID. To verify transaction safety and to keep all copies consistent requires that the chain from master to copy and vice versa has to be resolvable, preferably through PIDs directly. As part of EUDAT necessary services are created on the basis of iRODS. These form the core structure of the data infrastructure developed within EUDAT. Though many implementations of PID systems already exist, many valuable web accessible data sources come with unresolvable identifiers like UUIDs, with instable recognition patterns like URLs, or even with proprietary implementations. However, other data collections would like to link to them in the data descriptions of their metadata. In addition, by usage of PIDs one can decouple the responsibilities for data and MD in projects where necessary. For some metadata entities like persons or even institutes it makes sense to give them single PIDs that point to contact and/or location information. ORCID (Open Researcher & Contributor ID), e.g., keeps track of persons working in scholarly fields, independent of name changes and linguistic variances. The ISO 27729 based International Standard Name Identifier (ISNI) also identifies legal entities and fictional characters besides natural persons. Other systems exist that, e.g., reference geographic localities. IDs of this kind may resolve to a URL where detailed information is given.
Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics

NASA Astrophysics Data System (ADS)

Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.

2015-12-01

Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in NetCDF format has been implemented and will be demonstrated at AGU.
PyMICE: APython library for analysis of IntelliCage data.

PubMed

Dzik, Jakub M; Puścian, Alicja; Mijakowska, Zofia; Radwanska, Kasia; Łęski, Szymon

2018-04-01

IntelliCage is an automated system for recording the behavior of a group of mice housed together. It produces rich, detailed behavioral data calling for new methods and software for their analysis. Here we present PyMICE, a free and open-source library for analysis of IntelliCage data in the Python programming language. We describe the design and demonstrate the use of the library through a series of examples. PyMICE provides easy and intuitive access to IntelliCage data, and thus facilitates the possibility of using numerous other Python scientific libraries to form a complete data analysis workflow.
ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows

NASA Technical Reports Server (NTRS)

McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush

2004-01-01

With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.
A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta.

PubMed

Gostel, Morgan R; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A

2016-09-01

Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships.

Dynamic Data Citation through Provenance - new approach for reproducible science in Geoscience Australia.

NASA Astrophysics Data System (ADS)

Bastrakova, I.; Car, N.

2017-12-01

Geoscience Australia (GA) is recognised and respected as the National Repository and steward of multiple nationally significance data collections that provides geoscience information, services and capability to the Australian Government, industry and stakeholders. Internally, this brings a challenge of managing large volume (11 PB) of diverse and highly complex data distributed through a significant number of catalogues, applications, portals, virtual laboratories, and direct downloads from multiple locations. Externally, GA is facing constant changer in the Government regulations (e.g. open data and archival laws), growing stakeholder demands for high quality and near real-time delivery of data and products, and rapid technological advances enabling dynamic data access. Traditional approach to citing static data and products cannot satisfy increasing demands for the results from scientific workflows, or items within the workflows to be open, discoverable, thrusted and reproducible. Thus, citation of data, products, codes and applications through the implementation of provenance records is being implemented. This approach involves capturing the provenance of many GA processes according to a standardised data model and storing it, as well as metadata for the elements it references, in a searchable set of systems. This provides GA with ability to cite workflows unambiguously as well as each item within each workflow, including inputs and outputs and many other registered components. Dynamic objects can therefore be referenced flexibly in relation to their generation process - a dataset's metadata indicates where to obtain its provenance from - meaning the relevant facts of its dynamism need not be crammed into a single citation object with a single set of attributes. This allows for simple citations, similar to traditional static document citations such as references in journals, to be used for complex dynamic data and other objects such as software code.
Development of the workflow kine systems for support on KAIZEN.

PubMed

Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro

2012-01-01

In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dreher, M.; Peterka, T.

Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
Realizing the Living Paper using the ProvONE Model for Reproducible Research

NASA Astrophysics Data System (ADS)

Jones, M. B.; Jones, C. S.; Ludäscher, B.; Missier, P.; Walker, L.; Slaughter, P.; Schildhauer, M.; Cuevas-Vicenttín, V.

2015-12-01

Science has advanced through traditional publications that codify research results as a permenant part of the scientific record. But because publications are static and atomic, researchers can only cite and reference a whole work when building on prior work of colleagues. The open source software model has demonstrated a new approach in which strong version control in an open environment can nurture an open ecosystem of software. Developers now commonly fork and extend software giving proper credit, with less repetition, and with confidence in the relationship to original software. Through initiatives like 'Beyond the PDF', an analogous model has been imagined for open science, in which software, data, analyses, and derived products become first class objects within a publishing ecosystem that has evolved to be finer-grained and is realized through a web of linked open data. We have prototyped a Living Paper concept by developing the ProvONE provenance model for scientific workflows, with prototype deployments in DataONE. ProvONE promotes transparency and openness by describing the authenticity, origin, structure, and processing history of research artifacts and by detailing the steps in computational workflows that produce derived products. To realize the Living Paper, we decompose scientific papers into their constituent products and publish these as compound objects in the DataONE federation of archival repositories. Each individual finding and sub-product of a reseach project (such as a derived data table, a workflow or script, a figure, an image, or a finding) can be independently stored, versioned, and cited. ProvONE provenance traces link these fine-grained products within and across versions of a paper, and across related papers that extend an original analysis. This allows for open scientific publishing in which researchers extend and modify findings, creating a dynamic, evolving web of results that collectively represent the scientific enterprise. The Living Paper provides detailed metadata for properly interpreting and verifying individual research findings, for tracing the origin of ideas, for launching new lines of inquiry, and for implementing transitive credit for research and engineering.
Requirements for Workflow-Based EHR Systems - Results of a Qualitative Study.

PubMed

Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander

2016-01-01

Today's high quality healthcare delivery strongly relies on efficient electronic health records (EHR). These EHR systems or in general healthcare IT-systems are usually developed in a static manner according to a given workflow. Hence, they are not flexible enough to enable access to EHR data and to execute individual actions within a consultation. This paper reports on requirements pointed by experts in the domain of diabetes mellitus to design a system for supporting dynamic workflows to serve personalization within a medical activity. Requirements were collected by means of expert interviews. These interviews completed a conducted triangulation approach, aimed to gather requirements for workflow-based EHR interactions. The data from the interviews was analyzed through a qualitative approach resulting in a set of requirements enhancing EHR functionality from the user's perspective. Requirements were classified according to four different categorizations: (1) process-related requirements, (2) information needs, (3) required functions, (4) non-functional requirements. Workflow related requirements were identified which should be considered when developing and deploying EHR systems.
A knowledge-based decision support system in bioinformatics: an application to protein complex extraction

PubMed Central

2013-01-01

Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results. PMID:23368995
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2015-06-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

PubMed

Hartman, Douglas J

2016-03-01

Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.
Open innovation: Towards sharing of data, models and workflows.

PubMed

Conrado, Daniela J; Karlsson, Mats O; Romero, Klaus; Sarr, Céline; Wilkins, Justin J

2017-11-15

Sharing of resources across organisations to support open innovation is an old idea, but which is being taken up by the scientific community at increasing speed, concerning public sharing in particular. The ability to address new questions or provide more precise answers to old questions through merged information is among the attractive features of sharing. Increased efficiency through reuse, and increased reliability of scientific findings through enhanced transparency, are expected outcomes from sharing. In the field of pharmacometrics, efforts to publicly share data, models and workflow have recently started. Sharing of individual-level longitudinal data for modelling requires solving legal, ethical and proprietary issues similar to many other fields, but there are also pharmacometric-specific aspects regarding data formats, exchange standards, and database properties. Several organisations (CDISC, C-Path, IMI, ISoP) are working to solve these issues and propose standards. There are also a number of initiatives aimed at collecting disease-specific databases - Alzheimer's Disease (ADNI, CAMD), malaria (WWARN), oncology (PDS), Parkinson's Disease (PPMI), tuberculosis (CPTR, TB-PACTS, ReSeqTB) - suitable for drug-disease modelling. Organized sharing of pharmacometric executable model code and associated information has in the past been sparse, but a model repository (DDMoRe Model Repository) intended for the purpose has recently been launched. In addition several other services can facilitate model sharing more generally. Pharmacometric workflows have matured over the last decades and initiatives to more fully capture those applied to analyses are ongoing. In order to maximize both the impact of pharmacometrics and the knowledge extracted from clinical data, the scientific community needs to take ownership of and create opportunities for open innovation. Copyright © 2017 Elsevier B.V. All rights reserved.
76 FR 71928 - Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-11-21

... Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011-D027... Wide Area WorkFlow (WAWF) and TRICARE Encounter Data System (TEDS). WAWF, which electronically... civil emergencies, when access to Wide Area WorkFlow by those contractors is not feasible; (4) Purchases...
Flexible Early Warning Systems with Workflows and Decision Tables

NASA Astrophysics Data System (ADS)

Riedel, F.; Chaves, F.; Zeiner, H.

2012-04-01

An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows are usually only suited for rigid processes. We show how improvements can be achieved by using decision tables and rule-based adaptive workflows. Decision tables have been shown to be an intuitive tool that can be used by domain experts to express rule sets that can be interpreted automatically at runtime. Adaptive workflows use a rule-based approach to increase the flexibility of workflows by providing mechanisms to adapt workflows based on context changes, human intervention and availability of services. The combination of workflows, decision tables and rule-based adaption creates a framework that opens up new possibilities for flexible and adaptable workflows, especially, for use in early warning and crisis management systems.
Large scale and cloud-based multi-model analytics experiments on climate change data in the Earth System Grid Federation

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Płóciennik, Marcin; Doutriaux, Charles; Blanquer, Ignacio; Barbera, Roberto; Donvito, Giacinto; Williams, Dean N.; Anantharaj, Valentine; Salomoni, Davide D.; Aloisio, Giovanni

2017-04-01

In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated, such as the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). A case study on climate models intercomparison data analysis addressing several classes of multi-model experiments is being implemented in the context of the EU H2020 INDIGO-DataCloud project. Such experiments require the availability of large amount of data (multi-terabyte order) related to the output of several climate models simulations as well as the exploitation of scientific data management tools for large-scale data analytics. More specifically, the talk discusses in detail a use case on precipitation trend analysis in terms of requirements, architectural design solution, and infrastructural implementation. The experiment has been tested and validated on CMIP5 datasets, in the context of a large scale distributed testbed across EU and US involving three ESGF sites (LLNL, ORNL, and CMCC) and one central orchestrator site (PSNC). The general "environment" of the case study relates to: (i) multi-model data analysis inter-comparison challenges; (ii) addressed on CMIP5 data; and (iii) which are made available through the IS-ENES/ESGF infrastructure. The added value of the solution proposed in the INDIGO-DataCloud project are summarized in the following: (i) it implements a different paradigm (from client- to server-side); (ii) it intrinsically reduces data movement; (iii) it makes lightweight the end-user setup; (iv) it fosters re-usability (of data, final/intermediate products, workflows, sessions, etc.) since everything is managed on the server-side; (v) it complements, extends and interoperates with the ESGF stack; (vi) it provides a "tool" for scientists to run multi-model experiments, and finally; and (vii) it can drastically reduce the time-to-solution for these experiments from weeks to hours. At the time the contribution is being written, the proposed testbed represents the first concrete implementation of a distributed multi-model experiment in the ESGF/CMIP context joining server-side and parallel processing, end-to-end workflow management and cloud computing. As opposed to the current scenario based on search & discovery, data download, and client-based data analysis, the INDIGO-DataCloud architectural solution described in this contribution addresses the scientific computing & analytics requirements by providing a paradigm shift based on server-side and high performance big data frameworks jointly with two-level workflow management systems realized at the PaaS level via a cloud infrastructure.
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

NASA Astrophysics Data System (ADS)

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2017-10-01

We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.
Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization

PubMed Central

Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley

2015-01-01

Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173
Innovations in Medication Preparation Safety and Wastage Reduction: Use of a Workflow Management System in a Pediatric Hospital.

PubMed

Davis, Stephen Jerome; Hurtado, Josephine; Nguyen, Rosemary; Huynh, Tran; Lindon, Ivan; Hudnall, Cedric; Bork, Sara

2017-01-01

Background: USP <797> regulatory requirements have mandated that pharmacies improve aseptic techniques and cleanliness of the medication preparation areas. In addition, the Institute for Safe Medication Practices (ISMP) recommends that technology and automation be used as much as possible for preparing and verifying compounded sterile products. Objective: To determine the benefits associated with the implementation of the workflow management system, such as reducing medication preparation and delivery errors, reducing quantity and frequency of medication errors, avoiding costs, and enhancing the organization's decision to move toward positive patient identification (PPID). Methods: At Texas Children's Hospital, data were collected and analyzed from January 2014 through August 2014 in the pharmacy areas in which the workflow management system would be implemented. Data were excluded for September 2014 during the workflow management system oral liquid implementation phase. Data were collected and analyzed from October 2014 through June 2015 to determine whether the implementation of the workflow management system reduced the quantity and frequency of reported medication errors. Data collected and analyzed during the study period included the quantity of doses prepared, number of incorrect medication scans, number of doses discontinued from the workflow management system queue, and the number of doses rejected. Data were collected and analyzed to identify patterns of incorrect medication scans, to determine reasons for rejected medication doses, and to determine the reduction in wasted medications. Results: During the 17-month study period, the pharmacy department dispensed 1,506,220 oral liquid and injectable medication doses. From October 2014 through June 2015, the pharmacy department dispensed 826,220 medication doses that were prepared and checked via the workflow management system. Of those 826,220 medication doses, there were 16 reported incorrect volume errors. The error rate after the implementation of the workflow management system averaged 8.4%, which was a 1.6% reduction. After the implementation of the workflow management system, the average number of reported oral liquid medication and injectable medication errors decreased to 0.4 and 0.2 times per week, respectively. Conclusion: The organization was able to achieve its purpose and goal of improving the provision of quality pharmacy care through optimal medication use and safety by reducing medication preparation errors. Error rates decreased and the workflow processes were streamlined, which has led to seamless operations within the pharmacy department. There has been significant cost avoidance and waste reduction and enhanced interdepartmental satisfaction due to the reduction of reported medication errors.
A Six‐Stage Workflow for Robust Application of Systems Pharmacology

PubMed Central

Gadkar, K; Kirouac, DC; Mager, DE; van der Graaf, PH

2016-01-01

Quantitative and systems pharmacology (QSP) is increasingly being applied in pharmaceutical research and development. One factor critical to the ultimate success of QSP is the establishment of commonly accepted language, technical criteria, and workflows. We propose an integrated workflow that bridges conceptual objectives with underlying technical detail to support the execution, communication, and evaluation of QSP projects. PMID:27299936
Managing the life cycle of electronic clinical documents.

PubMed

Payne, Thomas H; Graham, Gail

2006-01-01

To develop a model of the life cycle of clinical documents from inception to use in a person's medical record, including workflow requirements from clinical practice, local policy, and regulation. We propose a model for the life cycle of clinical documents as a framework for research on documentation within electronic medical record (EMR) systems. Our proposed model includes three axes: the stages of the document, the roles of those involved with the document, and the actions those involved may take on the document at each stage. The model includes the rules to describe who (in what role) can perform what actions on the document, and at what stages they can perform them. Rules are derived from needs of clinicians, and requirements of hospital bylaws and regulators. Our model encompasses current practices for paper medical records and workflow in some EMR systems. Commercial EMR systems include methods for implementing document workflow rules. Workflow rules that are part of this model mirror functionality in the Department of Veterans Affairs (VA) EMR system where the Authorization/ Subscription Utility permits document life cycle rules to be written in English-like fashion. Creating a model of the life cycle of clinical documents serves as a framework for discussion of document workflow, how rules governing workflow can be implemented in EMR systems, and future research of electronic documentation.
Analog to digital workflow improvement: a quantitative study.

PubMed

Wideman, Catherine; Gallet, Jacqueline

2006-01-01

This study tracked a radiology department's conversion from utilization of a Kodak Amber analog system to a Kodak DirectView DR 5100 digital system. Through the use of ProModel Optimization Suite, a workflow simulation software package, significant quantitative information was derived from workflow process data measured before and after the change to a digital system. Once the digital room was fully operational and the radiology staff comfortable with the new system, average patient examination time was reduced from 9.24 to 5.28 min, indicating that a higher patient throughput could be achieved. Compared to the analog system, chest examination time for modality specific activities was reduced by 43%. The percentage of repeat examinations experienced with the digital system also decreased to 8% vs. the level of 9.5% experienced with the analog system. The study indicated that it is possible to quantitatively study clinical workflow and productivity by using commercially available software.
Text mining meets workflow: linking U-Compare with Taverna

PubMed Central

Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

2010-01-01

Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
A framework for service enterprise workflow simulation with multi-agents cooperation

NASA Astrophysics Data System (ADS)

Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun

2013-11-01

Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.

RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service

NASA Astrophysics Data System (ADS)

Yang, Chao; Chen, Nengcheng; Di, Liping

2012-10-01

Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
Using location tracking data to assess efficiency in established clinical workflows.

PubMed

Meyer, Mark; Fairbrother, Pamela; Egan, Marie; Chueh, Henry; Sandberg, Warren S

2006-01-01

Location tracking systems are becoming more prevalent in clinical settings yet applications still are not common. We have designed a system to aid in the assessment of clinical workflow efficiency. Location data is captured from active RFID tags and processed into usable data. These data are stored and presented visually with trending capability over time. The system allows quick assessments of the impact of process changes on workflow, and isolates areas for improvement.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lingerfelt, Eric J; Endeve, Eirik; Hui, Yawei

Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now--with the rise of multimodal acquisition systems and the associated processing capability--the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalablemore » data analysis and simulation and manage uploaded data files via an intuitive, cross-platform client user interface. This framework delivers authenticated, "push-button" execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing compute-and-data cloud infrastructures and HPC environments like Titan at the Oak Ridge Leadershp Computing Facility (OLCF).« less
A framework for integration of scientific applications into the OpenTopography workflow

NASA Astrophysics Data System (ADS)

Nandigam, V.; Crosby, C.; Baru, C.

2012-12-01

The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by virtually extending the OpenTopography service over the various infrastructures running these scientific applications and processing routines. This involves packaging and distributing a customized instance of the Opal toolkit that will wrap the software application as an OPAL-based web service and integrate it into the OpenTopography framework. We plan to make this as automated as possible. A structured specification of service inputs and outputs along with metadata annotations encoded in XML can be utilized to automate the generation of user interfaces, with appropriate tools tips and user help features, and generation of other internal software. The OpenTopography Opal toolkit will also include the customizations that will enable security authentication, authorization and the ability to write application usage and job statistics back to the OpenTopography databases. This usage information could then be reported to the original service providers and used for auditing and performance improvements. This pluggable framework will enable the application developers to continue to work on enhancing their application while making the latest iteration available in a timely manner to the earth sciences community. This will also help us establish an overall framework that other scientific application providers will also be able to use going forward.
Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics

PubMed Central

Astakhov, V.; Bandrowski, A.; Gupta, A.; Kulungowski, A.W.; Grethe, J.S.; Bouwer, J.; Molina, T.; Rowley, V.; Penticoff, S.; Terada, M.; Wong, W.; Hakozaki, H.; Kwon, O.; Martone, M.E.; Ellisman, M.

2016-01-01

We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions. PMID:28479932
Workflow technology: the new frontier. How to overcome the barriers and join the future.

PubMed

Shefter, Susan M

2006-01-01

Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.

PubMed

Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G

2013-07-01

MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows

PubMed Central

Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.

2013-01-01

MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784
Managing large-scale workflow execution from resource provisioning to provenance tracking: The CyberShake example

USGS Publications Warehouse

Deelman, E.; Callaghan, S.; Field, E.; Francoeur, H.; Graves, R.; Gupta, N.; Gupta, V.; Jordan, T.H.; Kesselman, C.; Maechling, P.; Mehringer, J.; Mehta, G.; Okaya, D.; Vahi, K.; Zhao, L.

2006-01-01

This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid. ?? 2006 IEEE.
An Auto-management Thesis Program WebMIS Based on Workflow

NASA Astrophysics Data System (ADS)

Chang, Li; Jie, Shi; Weibo, Zhong

An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources

PubMed Central

Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
The Cherenkov Telescope Array Observatory: top level use cases

NASA Astrophysics Data System (ADS)

Bulgarelli, A.; Kosack, K.; Hinton, J.; Tosti, G.; Schwanke, U.; Schwarz, J.; Colomé, P.; Conforti, V.; Khelifi, B.; Goullon, J.; Ong, R.; Markoff, S.; Contreras, J. L.; Lucarelli, F.; Antonelli, L. A.; Bigongiari, C.; Boisson, C.; Bosnjak, Z.; Brau-Nogué, S.; Carosi, A.; Chen, A.; Cotter, G.; Covino, S.; Daniel, M.; De Cesare, G.; de Ona Wilhelmi, E.; Della Volpe, M.; Di Pierro, F.; Fioretti, V.; Füßling, M.; Garczarczyk, M.; Gaug, M.; Glicenstein, J. F.; Goldoni, P.; Götz, D.; Grandi, P.; Heller, M.; Hermann, G.; Inoue, S.; Knödlseder, J.; Lenain, J.-P.; Lindfors, E.; Lombardi, S.; Luque-Escamilla, P.; Maier, G.; Marisaldi, M.; Mundell, C.; Neyroud, N.; Noda, K.; O'Brien, P.; Petrucci, P. O.; Martí Ribas, J.; Ribó, M.; Rodriguez, J.; Romano, P.; Schmid, J.; Serre, N.; Sol, H.; Schussler, F.; Stamerra, A.; Stolarczyk, T.; Vandenbrouck, J.; Vercellone, S.; Vergani, S.; Zech, A.; Zoli, A.

2016-08-01

Today the scientific community is facing an increasing complexity of the scientific projects, from both a technological and a management point of view. The reason for this is in the advance of science itself, where new experiments with unprecedented levels of accuracy, precision and coverage (time and spatial) are realised. Astronomy is one of the fields of the physical sciences where a strong interaction between the scientists, the instrument and software developers is necessary to achieve the goals of any Big Science Project. The Cherenkov Telescope Array (CTA) will be the largest ground-based very high-energy gamma-ray observatory of the next decades. To achieve the full potential of the CTA Observatory, the system must be put into place to enable users to operate the telescopes productively. The software will cover all stages of the CTA system, from the preparation of the observing proposals to the final data reduction, and must also fit into the overall system. Scientists, engineers, operators and others will use the system to operate the Observatory, hence they should be involved in the design process from the beginning. We have organised a workgroup and a workflow for the definition of the CTA Top Level Use Cases in the context of the Requirement Management activities of the CTA Observatory. Scientists, instrument and software developers are collaborating and sharing information to provide a common and general understanding of the Observatory from a functional point of view. Scientists that will use the CTA Observatory will provide mainly Science Driven Use Cases, whereas software engineers will subsequently provide more detailed Use Cases, comments and feedbacks. The main purposes are to define observing modes and strategies, and to provide a framework for the flow down of the Use Cases and requirements to check missing requirements and the already developed Use-Case models at CTA sub-system level. Use Cases will also provide the basis for the definition of the Acceptance Test Plan for the validation of the overall CTA system. In this contribution we present the organisation and the workflow of the Top Level Use Cases workgroup.
WRF4SG: A Scientific Gateway for climate experiment workflows

NASA Astrophysics Data System (ADS)

Blanco, Carlos; Cofino, Antonio S.; Fernandez-Quiruelas, Valvanuz

2013-04-01

The Weather Research and Forecasting model (WRF) is a community-driven and public domain model widely used by the weather and climate communities. As opposite to other application-oriented models, WRF provides a flexible and computationally-efficient framework which allows solving a variety of problems for different time-scales, from weather forecast to climate change projection. Furthermore, WRF is also widely used as a research tool in modeling physics, dynamics, and data assimilation by the research community. Climate experiment workflows based on Weather Research and Forecasting (WRF) are nowadays among the one of the most cutting-edge applications. These workflows are complex due to both large storage and the huge number of simulations executed. In order to manage that, we have developed a scientific gateway (SG) called WRF for Scientific Gateway (WRF4SG) based on WS-PGRADE/gUSE and WRF4G frameworks to ease achieve WRF users needs (see [1] and [2]). WRF4SG provides services for different use cases that describe the different interactions between WRF users and the WRF4SG interface in order to show how to run a climate experiment. As WS-PGRADE/gUSE uses portlets (see [1]) to interact with users, its portlets will support these use cases. A typical experiment to be carried on by a WRF user will consist on a high-resolution regional re-forecast. These re-forecasts are common experiments used as input data form wind power energy and natural hazards (wind and precipitation fields). In the cases below, the user is able to access to different resources such as Grid due to the fact that WRF needs a huge amount of computing resources in order to generate useful simulations: * Resource configuration and user authentication: The first step is to authenticate on users' Grid resources by virtual organizations. After login, the user is able to select which virtual organization is going to be used by the experiment. * Data assimilation: In order to assimilate the data sources, the user has to select them browsing through LFC Portlet. * Design Experiment workflow: In order to configure the experiment, the user will define the type of experiment (i.e. re-forecast), and its attributes to simulate. In this case the main attributes are: the field of interest (wind, precipitation, ...), the start and end date simulation and the requirements of the experiment. * Monitor workflow: In order to monitor the experiment the user will receive notification messages based on events and also the gateway will display the progress of the experiment. * Data storage: Like Data assimilation case, the user is able to browse and view the output data simulations using LFC Portlet. The objectives of WRF4SG can be described by considering two goals. The first goal is to show how WRF4SG facilitates to execute, monitor and manage climate workflows based on the WRF4G framework. And the second goal of WRF4SG is to help WRF users to execute their experiment workflows concurrently using heterogeneous computing resources such as HPC and Grid. [1] Kacsuk, P.: P-GRADE portal family for grid infrastructures. Concurrency and Computation: Practice and Experience. 23, 235-245 (2011). [2] http://www.meteo.unican.es/software/wrf4g
Design and implementation of land reservation system

NASA Astrophysics Data System (ADS)

Gao, Yurong; Gao, Qingqiang

2009-10-01

Land reservation is defined as a land management policy for insuring the government to control primary land market. It requires the government to obtain the land first, according to plan, by purchase, confiscation and exchanging, and then exploit and consolidate the land for reservation. Underlying this policy, it is possible for the government to satisfy and manipulate the needs of land for urban development. The author designs and develops "Land Reservation System for Eastern Lake Development District" (LRSELDD), which deals with the realistic land requirement problems in Wuhan Eastern Lake Development Districts. The LRSELDD utilizes modern technologies and solutions of computer science and GIS to process multiple source data related with land. Based on experiments on the system, this paper will first analyze workflow land reservation system and design the system structure based on its principles, then illustrate the approach of organization and management of spatial data, describe the system functions according to the characteristics of land reservation and consolidation finally. The system is running to serve for current work in Eastern Lake Development Districts. It is able to scientifically manage both current and planning land information, as well as the information about land supplying. We use the LRSELDD in our routine work, and with such information, decisions on land confiscation and allocation will be made wisely and scientifically.
Implementation of workflow engine technology to deliver basic clinical decision support functionality.

PubMed

Huser, Vojtech; Rasmussen, Luke V; Oberg, Ryan; Starren, Justin B

2011-04-10

Workflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic. We present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture. We describe an implementation of a free workflow technology software suite (available at http://code.google.com/p/healthflow) and its application in the domain of clinical decision support. Our implementation seamlessly supports clinical logic testing on retrospective data and offers a user-friendly knowledge representation paradigm. With the presented software implementation, we demonstrate that workflow engine technology can provide a decision support platform which evaluates well against an established clinical decision support architecture evaluation framework. Due to cross-industry usage of workflow engine technology, we can expect significant future functionality enhancements that will further improve the technology's capacity to serve as a clinical decision support platform.
Data Integration Tool: From Permafrost Data Translation Research Tool to A Robust Research Application

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Strawhacker, C.; Pulsifer, P. L.; Thurmes, N.

2016-12-01

The United States National Science Foundation funded PermaData project led by the National Snow and Ice Data Center (NSIDC) with a team from the Global Terrestrial Network for Permafrost (GTN-P) aimed to improve permafrost data access and discovery. We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the GTN-P. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets. Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs. Originally it was written to capture a scientist's personal, iterative, data manipulation and quality control process of visually and programmatically iterating through inconsistent input data, examining it to find problems, adding operations to address the problems, and rerunning until the data could be translated into the GTN-P standard format. Iterative development of this tool led to a Fortran/Python hybrid then, with consideration of users, licensing, version control, packaging, and workflow, to a publically available, robust, usable application. Transitioning to Python allowed the use of open source frameworks for the workflow core and integration with a javascript graphical workflow interface. DIT is targeted to automatically handle 90% of the data processing for field scientists, modelers, and non-discipline scientists. It is available as an open source tool in GitHub packaged for a subset of Mac, Windows, and UNIX systems as a desktop application with a graphical workflow manager. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.
A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

PubMed Central

Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

2016-01-01

Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships. PMID:27672517

A Semi-Automated Workflow Solution for Data Set Publication

DOE PAGES

Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; ...

2016-03-08

In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
CASAS: A tool for composing automatically and semantically astrophysical services

NASA Astrophysics Data System (ADS)

Louge, T.; Karray, M. H.; Archimède, B.; Knödlseder, J.

2017-07-01

Multiple astronomical datasets are available through internet and the astrophysical Distributed Computing Infrastructure (DCI) called Virtual Observatory (VO). Some scientific workflow technologies exist for retrieving and combining data from those sources. However selection of relevant services, automation of the workflows composition and the lack of user-friendly platforms remain a concern. This paper presents CASAS, a tool for semantic web services composition in astrophysics. This tool proposes automatic composition of astrophysical web services and brings a semantics-based, automatic composition of workflows. It widens the services choice and eases the use of heterogeneous services. Semantic web services composition relies on ontologies for elaborating the services composition; this work is based on Astrophysical Services ONtology (ASON). ASON had its structure mostly inherited from the VO services capacities. Nevertheless, our approach is not limited to the VO and brings VO plus non-VO services together without the need for premade recipes. CASAS is available for use through a simple web interface.
SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research.

PubMed

Yaniv, Ziv; Lowekamp, Bradley C; Johnson, Hans J; Beare, Richard

2018-06-01

Modern scientific endeavors increasingly require team collaborations to construct and interpret complex computational workflows. This work describes an image-analysis environment that supports the use of computational tools that facilitate reproducible research and support scientists with varying levels of software development skills. The Jupyter notebook web application is the basis of an environment that enables flexible, well-documented, and reproducible workflows via literate programming. Image-analysis software development is made accessible to scientists with varying levels of programming experience via the use of the SimpleITK toolkit, a simplified interface to the Insight Segmentation and Registration Toolkit. Additional features of the development environment include user friendly data sharing using online data repositories and a testing framework that facilitates code maintenance. SimpleITK provides a large number of examples illustrating educational and research-oriented image analysis workflows for free download from GitHub under an Apache 2.0 license: github.com/InsightSoftwareConsortium/SimpleITK-Notebooks .
A Semi-Automated Workflow Solution for Data Set Publication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.

In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
Information Issues and Contexts that Impair Team Based Communication Workflow: A Palliative Sedation Case Study.

PubMed

Cornett, Alex; Kuziemsky, Craig

2015-01-01

Implementing team based workflows can be complex because of the scope of providers involved and the extent of information exchange and communication that needs to occur. While a workflow may represent the ideal structure of communication that needs to occur, information issues and contextual factors may impact how the workflow is implemented in practice. Understanding these issues will help us better design systems to support team based workflows. In this paper we use a case study of palliative sedation therapy (PST) to model a PST workflow and then use it to identify purposes of communication, information issues and contextual factors that impact them. We then suggest how our findings could inform health information technology (HIT) design to support team based communication workflows.
Visualizing the Big (and Large) Data from an HPC Resource

NASA Astrophysics Data System (ADS)

Sisneros, R.

2015-10-01

Supercomputers are built to endure painfully large simulations and contend with resulting outputs. These are characteristics that scientists are all too willing to test the limits of in their quest for science at scale. The data generated during a scientist's workflow through an HPC center (large data) is the primary target for analysis and visualization. However, the hardware itself is also capable of generating volumes of diagnostic data (big data); this presents compelling opportunities to deploy analogous analytic techniques. In this paper we will provide a survey of some of the many ways in which visualization and analysis may be crammed into the scientific workflow as well as utilized on machine-specific data.
High-volume workflow management in the ITN/FBI system

NASA Astrophysics Data System (ADS)

Paulson, Thomas L.

1997-02-01

The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.
[Integration of the radiotherapy irradiation planning in the digital workflow].

PubMed

Röhner, F; Schmucker, M; Henne, K; Momm, F; Bruggmoser, G; Grosu, A-L; Frommhold, H; Heinemann, F E

2013-02-01

At the Clinic of Radiotherapy at the University Hospital Freiburg, all relevant workflow is paperless. After implementing the Operating Schedule System (OSS) as a framework, all processes are being implemented into the departmental system MOSAIQ. Designing a digital workflow for radiotherapy irradiation planning is a large challenge, it requires interdisciplinary expertise and therefore the interfaces between the professions also have to be interdisciplinary. For every single step of radiotherapy irradiation planning, distinct responsibilities have to be defined and documented. All aspects of digital storage, backup and long-term availability of data were considered and have already been realized during the OSS project. After an analysis of the complete workflow and the statutory requirements, a detailed project plan was designed. In an interdisciplinary workgroup, problems were discussed and a detailed flowchart was developed. The new functionalities were implemented in a testing environment by the Clinical and Administrative IT Department (CAI). After extensive tests they were integrated into the new modular department system. The Clinic of Radiotherapy succeeded in realizing a completely digital workflow for radiotherapy irradiation planning. During the testing phase, our digital workflow was examined and afterwards was approved by the responsible authority.
Design and implementation of workflow engine for service-oriented architecture

NASA Astrophysics Data System (ADS)

Peng, Shuqing; Duan, Huining; Chen, Deyun

2009-04-01

As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
A Computational Workflow for the Automated Generation of Models of Genetic Designs.

PubMed

Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil

2018-06-05

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.
Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

PubMed

Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

2012-01-01

Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.
An automated analysis workflow for optimization of force-field parameters using neutron scattering data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu

Graphical abstract: - Highlights: • An automated workflow to optimize force-field parameters. • Used the workflow to optimize force-field parameter for a system containing nanodiamond and tRNA. • The mechanism relies on molecular dynamics simulation and neutron scattering experimental data. • The workflow can be generalized to any other experimental and simulation techniques. - Abstract: Large-scale simulations and data analysis are often required to explain neutron scattering experiments to establish a connection between the fundamental physics at the nanoscale and data probed by neutrons. However, to perform simulations at experimental conditions it is critical to use correct force-field (FF) parametersmore » which are unfortunately not available for most complex experimental systems. In this work, we have developed a workflow optimization technique to provide optimized FF parameters by comparing molecular dynamics (MD) to neutron scattering data. We describe the workflow in detail by using an example system consisting of tRNA and hydrophilic nanodiamonds in a deuterated water (D{sub 2}O) environment. Quasi-elastic neutron scattering (QENS) data show a faster motion of the tRNA in the presence of nanodiamond than without the ND. To compare the QENS and MD results quantitatively, a proper choice of FF parameters is necessary. We use an efficient workflow to optimize the FF parameters between the hydrophilic nanodiamond and water by comparing to the QENS data. Our results show that we can obtain accurate FF parameters by using this technique. The workflow can be generalized to other types of neutron data for FF optimization, such as vibrational spectroscopy and spin echo.« less
Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

NASA Astrophysics Data System (ADS)

Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

2017-12-01

This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.
Enhancing GIS Capabilities for High Resolution Earth Science Grids

NASA Astrophysics Data System (ADS)

Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.

2017-12-01

Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.
Using EHR audit trail logs to analyze clinical workflow: A case study from community-based ambulatory clinics.

PubMed

Wu, Danny T Y; Smart, Nikolas; Ciemins, Elizabeth L; Lanham, Holly J; Lindberg, Curt; Zheng, Kai

2017-01-01

To develop a workflow-supported clinical documentation system, it is a critical first step to understand clinical workflow. While Time and Motion studies has been regarded as the gold standard of workflow analysis, this method can be resource consuming and its data may be biased due to the cognitive limitation of human observers. In this study, we aimed to evaluate the feasibility and validity of using EHR audit trail logs to analyze clinical workflow. Specifically, we compared three known workflow changes from our previous study with the corresponding EHR audit trail logs of the study participants. The results showed that EHR audit trail logs can be a valid source for clinical workflow analysis, and can provide an objective view of clinicians' behaviors, multi-dimensional comparisons, and a highly extensible analysis framework.
An architecture model for multiple disease management information systems.

PubMed

Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei

2013-04-01

Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.
Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform.

PubMed

Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N

2017-03-01

High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.
Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

PubMed Central

Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.

2016-01-01

High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692
Task Delegation Based Access Control Models for Workflow Systems

NASA Astrophysics Data System (ADS)

Gaaloul, Khaled; Charoy, François

e-Government organisations are facilitated and conducted using workflow management systems. Role-based access control (RBAC) is recognised as an efficient access control model for large organisations. The application of RBAC in workflow systems cannot, however, grant permissions to users dynamically while business processes are being executed. We currently observe a move away from predefined strict workflow modelling towards approaches supporting flexibility on the organisational level. One specific approach is that of task delegation. Task delegation is a mechanism that supports organisational flexibility, and ensures delegation of authority in access control systems. In this paper, we propose a Task-oriented Access Control (TAC) model based on RBAC to address these requirements. We aim to reason about task from organisational perspectives and resources perspectives to analyse and specify authorisation constraints. Moreover, we present a fine grained access control protocol to support delegation based on the TAC model.
MetaboTools: A comprehensive toolbox for analysis of genome-scale metabolic models

DOE PAGES

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

2016-08-03

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less

π Scope: python based scientific workbench with visualization tool for MDSplus data

NASA Astrophysics Data System (ADS)

Shiraiwa, S.

2014-10-01

π Scope is a python based scientific data analysis and visualization tool constructed on wxPython and Matplotlib. Although it is designed to be a generic tool, the primary motivation for developing the new software is 1) to provide an updated tool to browse MDSplus data, with functionalities beyond dwscope and jScope, and 2) to provide a universal foundation to construct interface tools to perform computer simulation and modeling for Alcator C-Mod. It provides many features to visualize MDSplus data during tokamak experiments including overplotting different signals and discharges, various plot types (line, contour, image, etc.), in-panel data analysis using python scripts, and publication quality graphics generation. Additionally, the logic to produce multi-panel plots is designed to be backward compatible with dwscope, enabling smooth migration for dwscope users. πScope uses multi-threading to reduce data transfer latency, and its object-oriented design makes it easy to modify and expand while the open source nature allows portability. A built-in tree data browser allows a user to approach the data structure both from a GUI and a script, enabling relatively complex data analysis workflow to be built quickly. As an example, an IDL-based interface to perform GENRAY/CQL3D simulations was ported on πScope, thus allowing LHCD simulation to be run between-shot using C-Mod experimental profiles. This workflow is being used to generate a large database to develop a LHCD actuator model for the plasma control system. Supported by USDoE Award DE-FC02-99ER54512.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.

2008-12-01

NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

NASA Astrophysics Data System (ADS)

Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.

2009-04-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Use of contextual inquiry to understand anatomic pathology workflow: Implications for digital pathology adoption

PubMed Central

Ho, Jonhan; Aridor, Orly; Parwani, Anil V.

2012-01-01

Background: For decades anatomic pathology (AP) workflow have been a highly manual process based on the use of an optical microscope and glass slides. Recent innovations in scanning and digitizing of entire glass slides are accelerating a move toward widespread adoption and implementation of a workflow based on digital slides and their supporting information management software. To support the design of digital pathology systems and ensure their adoption into pathology practice, the needs of the main users within the AP workflow, the pathologists, should be identified. Contextual inquiry is a qualitative, user-centered, social method designed to identify and understand users’ needs and is utilized for collecting, interpreting, and aggregating in-detail aspects of work. Objective: Contextual inquiry was utilized to document current AP workflow, identify processes that may benefit from the introduction of digital pathology systems, and establish design requirements for digital pathology systems that will meet pathologists’ needs. Materials and Methods: Pathologists were observed and interviewed at a large academic medical center according to contextual inquiry guidelines established by Holtzblatt et al. 1998. Notes representing user-provided data were documented during observation sessions. An affinity diagram, a hierarchal organization of the notes based on common themes in the data, was created. Five graphical models were developed to help visualize the data including sequence, flow, artifact, physical, and cultural models. Results: A total of six pathologists were observed by a team of two researchers. A total of 254 affinity notes were documented and organized using a system based on topical hierarchy, including 75 third-level, 24 second-level, and five main-level categories, including technology, communication, synthesis/preparation, organization, and workflow. Current AP workflow was labor intensive and lacked scalability. A large number of processes that may possibly improve following the introduction of digital pathology systems were identified. These work processes included case management, case examination and review, and final case reporting. Furthermore, a digital slide system should integrate with the anatomic pathologic laboratory information system. Conclusions: To our knowledge, this is the first study that utilized the contextual inquiry method to document AP workflow. Findings were used to establish key requirements for the design of digital pathology systems. PMID:23243553
The BepiColombo Archive Core System (BACS)

NASA Astrophysics Data System (ADS)

Macfarlane, A. J.; Osuna, P.; Pérez-López, F.; Vallejo, J. C.; Martinez, S.; Arviset, C.; Casale, M.

2015-09-01

BepiColombo is an interdisciplinary ESA mission to explore the planet Mercury in cooperation with JAXA. The mission consists of two separate orbiters: ESA's Mercury Planetary Orbiter (MPO) and JAXA's Mercury Magnetospheric Orbiter (MMO), which are dedicated to the detailed study of the planet and its magnetosphere. The MPO scientific payload comprises 11 instruments covering different scientific disciplines developed by several European teams. The MPO science operations will be prepared by the MPO Science Ground Segment (SGS) located at the European Space Astronomy Centre (ESAC) in Madrid. The BepiColombo Archive Core System (BACS) will be the central archive in which all mission operational data will be stored and is being developed by the Science Archives and Virtual Observatory Team (SAT) also at ESAC. The BACS will act as one of the modular subsystems within the BepiColombo Science Operations Control System (BSCS), (Vallejo 2014; Pérez-López 2014) which is under the responsibility of the SGS, with the purpose of facilitating the information exchange of data and metadata between the other subsystems of the BSCS as well as with the MPO Instrument Teams. This paper gives an overview of the concept and design of the BACS and how it integrates into the science ground segment workflow.
Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach

PubMed Central

Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

2012-01-01

Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

NASA Astrophysics Data System (ADS)

Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.

2011-12-01

Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.
ESGF and WDCC: The Double Structure of the Digital Data Storage at DKRZ

NASA Astrophysics Data System (ADS)

Toussaint, F.; Höck, H.

2016-12-01

Since a couple of years, Digital Repositories of climate science face new challenges: International projects are global collaborations. The data storage in parallel moved to federated, distributed storage systems like ESGF. For the long term archival storage (LTA) on the other hand, communities, funders, and data users make stronger demands for data and metadata quality to facilitate data use and reuse. At DKRZ, this situation led to a twofold data dissemination system - a situation which has influence on administration, workflows, and sustainability of the data. The ESGF system is focused on the needs of users as partners in global projects. It includes replication tools, detailed global project standards, and efficient search for the data to download. In contrast, DKRZ's classical CERA LTA storage aims for long term data holding and data curation as well as for data reuse requiring high metadata quality standards. In addition, for LTA data a Digital Object Identifier publication service for the direct integration of research data in scientific publications has been implemented. The editorial process at DKRZ-LTA ensures the quality of metadata and research data. The DOI and a citation code are provided and afterwards registered under DataCite's (datacite.org) regulations. In the overall data life cycle continuous reliability of the data and metadata quality is essential to allow for data handling at Petabytes level, data long term usability, and adequate publication of the results. These considerations lead to the question "What is quality" - with respect to data, to the repository itself, to the publisher, and the user? Global consensus is needed for these assessments as the phases of the end to end workflow gear into each other: For data and metadata, checks need to go hand in hand with the processes of production and storage. The results can be judged following a Quality Maturity Matrix (QMM). Repositories can be certified according to their trustworthiness. For the publication of any scientific conclusions, scientific community, funders, media, and policy makers ask for the publisher's impact in terms of readers' credit, run, and presentation quality. The paper describes the data life cycle. Emphasis is put on the different levels of quality assessment which at DKRZ ensure the data and metadata quality.
How iSamples (Internet of Samples in the Earth Sciences) Improves Sample and Data Stewardship in the Next Generation of Geoscientists

NASA Astrophysics Data System (ADS)

Hallett, B. W.; Dere, A. L. D.; Lehnert, K.; Carter, M.

2016-12-01

Vast numbers of physical samples are routinely collected by geoscientists to probe key scientific questions related to global climate change, biogeochemical cycles, magmatic processes, mantle dynamics, etc. Despite their value as irreplaceable records of nature the majority of these samples remain undiscoverable by the broader scientific community because they lack a digital presence or are not well-documented enough to facilitate their discovery and reuse for future scientific and educational use. The NSF EarthCube iSamples Research Coordination Network seeks to develop a unified approach across all Earth Science disciplines for the registration, description, identification, and citation of physical specimens in order to take advantage of the new opportunities that cyberinfrastructure offers. Even as consensus around best practices begins to emerge, such as the use of the International Geo Sample Number (IGSN), more work is needed to communicate these practices to investigators to encourage widespread adoption. Recognizing the importance of students and early career scientists in particular to transforming data and sample management practices, the iSamples Education and Training Working Group is developing training modules for sample collection, documentation, and management workflows. These training materials are made available to educators/research supervisors online at http://earthcube.org/group/isamples and can be modularized for supervisors to create a customized research workflow. This study details the design and development of several sample management tutorials, created by early career scientists and documented in collaboration with undergraduate research students in field and lab settings. Modules under development focus on rock outcrops, rock cores, soil cores, and coral samples, with an emphasis on sample management throughout the collection, analysis and archiving process. We invite others to share their sample management/registration workflows and to develop training modules. This educational approach, with evolving digital materials, can help prepare future scientists to perform research in a way that will contribute to EarthCube data integration and discovery.
Design of low noise imaging system

NASA Astrophysics Data System (ADS)

Hu, Bo; Chen, Xiaolai

2017-10-01

In order to meet the needs of engineering applications for low noise imaging system under the mode of global shutter, a complete imaging system is designed based on the SCMOS (Scientific CMOS) image sensor CIS2521F. The paper introduces hardware circuit and software system design. Based on the analysis of key indexes and technologies about the imaging system, the paper makes chips selection and decides SCMOS + FPGA+ DDRII+ Camera Link as processing architecture. Then it introduces the entire system workflow and power supply and distribution unit design. As for the software system, which consists of the SCMOS control module, image acquisition module, data cache control module and transmission control module, the paper designs in Verilog language and drives it to work properly based on Xilinx FPGA. The imaging experimental results show that the imaging system exhibits a 2560*2160 pixel resolution, has a maximum frame frequency of 50 fps. The imaging quality of the system satisfies the requirement of the index.
A case study on the impacts of computerized provider order entry (CPOE) system on hospital clinical workflow.

PubMed

Mominah, Maher; Yunus, Faisel; Househ, Mowafa S

2013-01-01

Computerized provider order entry (CPOE) is a health informatics system that helps health care providers create and manage orders for medications and other health care services. Through the automation of the ordering process, CPOE has improved the overall efficiency of hospital processes and workflow. In Saudi Arabia, CPOE has been used for years, with only a few studies evaluating the impacts of CPOE on clinical workflow. In this paper, we discuss the experience of a local hospital with the use of CPOE and its impacts on clinical workflow. Results show that there are many issues related to the implementation and use of CPOE within Saudi Arabia that must be addressed, including design, training, medication errors, alert fatigue, and system dep Recommendations for improving CPOE use within Saudi Arabia are also discussed.
A Chaotic Particle Swarm Optimization-Based Heuristic for Market-Oriented Task-Level Scheduling in Cloud Workflow Systems.

PubMed

Li, Xuejun; Xu, Jia; Yang, Yun

2015-01-01

Cloud workflow system is a kind of platform service based on cloud computing. It facilitates the automation of workflow applications. Between cloud workflow system and its counterparts, market-oriented business model is one of the most prominent factors. The optimization of task-level scheduling in cloud workflow system is a hot topic. As the scheduling is a NP problem, Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) have been proposed to optimize the cost. However, they have the characteristic of premature convergence in optimization process and therefore cannot effectively reduce the cost. To solve these problems, Chaotic Particle Swarm Optimization (CPSO) algorithm with chaotic sequence and adaptive inertia weight factor is applied to present the task-level scheduling. Chaotic sequence with high randomness improves the diversity of solutions, and its regularity assures a good global convergence. Adaptive inertia weight factor depends on the estimate value of cost. It makes the scheduling avoid premature convergence by properly balancing between global and local exploration. The experimental simulation shows that the cost obtained by our scheduling is always lower than the other two representative counterparts.
A Chaotic Particle Swarm Optimization-Based Heuristic for Market-Oriented Task-Level Scheduling in Cloud Workflow Systems

PubMed Central

Li, Xuejun; Xu, Jia; Yang, Yun

2015-01-01

Cloud workflow system is a kind of platform service based on cloud computing. It facilitates the automation of workflow applications. Between cloud workflow system and its counterparts, market-oriented business model is one of the most prominent factors. The optimization of task-level scheduling in cloud workflow system is a hot topic. As the scheduling is a NP problem, Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) have been proposed to optimize the cost. However, they have the characteristic of premature convergence in optimization process and therefore cannot effectively reduce the cost. To solve these problems, Chaotic Particle Swarm Optimization (CPSO) algorithm with chaotic sequence and adaptive inertia weight factor is applied to present the task-level scheduling. Chaotic sequence with high randomness improves the diversity of solutions, and its regularity assures a good global convergence. Adaptive inertia weight factor depends on the estimate value of cost. It makes the scheduling avoid premature convergence by properly balancing between global and local exploration. The experimental simulation shows that the cost obtained by our scheduling is always lower than the other two representative counterparts. PMID:26357510
Implementation of workflow engine technology to deliver basic clinical decision support functionality

PubMed Central

2011-01-01

Background Workflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic. Results We present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture. Conclusions We describe an implementation of a free workflow technology software suite (available at http://code.google.com/p/healthflow) and its application in the domain of clinical decision support. Our implementation seamlessly supports clinical logic testing on retrospective data and offers a user-friendly knowledge representation paradigm. With the presented software implementation, we demonstrate that workflow engine technology can provide a decision support platform which evaluates well against an established clinical decision support architecture evaluation framework. Due to cross-industry usage of workflow engine technology, we can expect significant future functionality enhancements that will further improve the technology's capacity to serve as a clinical decision support platform. PMID:21477364
Web-video-mining-supported workflow modeling for laparoscopic surgeries.

PubMed

Liu, Rui; Zhang, Xiaoli; Zhang, Hao

2016-11-01

As quality assurance is of strong concern in advanced surgeries, intelligent surgical systems are expected to have knowledge such as the knowledge of the surgical workflow model (SWM) to support their intuitive cooperation with surgeons. For generating a robust and reliable SWM, a large amount of training data is required. However, training data collected by physically recording surgery operations is often limited and data collection is time-consuming and labor-intensive, severely influencing knowledge scalability of the surgical systems. The objective of this research is to solve the knowledge scalability problem in surgical workflow modeling with a low cost and labor efficient way. A novel web-video-mining-supported surgical workflow modeling (webSWM) method is developed. A novel video quality analysis method based on topic analysis and sentiment analysis techniques is developed to select high-quality videos from abundant and noisy web videos. A statistical learning method is then used to build the workflow model based on the selected videos. To test the effectiveness of the webSWM method, 250 web videos were mined to generate a surgical workflow for the robotic cholecystectomy surgery. The generated workflow was evaluated by 4 web-retrieved videos and 4 operation-room-recorded videos, respectively. The evaluation results (video selection consistency n-index ≥0.60; surgical workflow matching degree ≥0.84) proved the effectiveness of the webSWM method in generating robust and reliable SWM knowledge by mining web videos. With the webSWM method, abundant web videos were selected and a reliable SWM was modeled in a short time with low labor cost. Satisfied performances in mining web videos and learning surgery-related knowledge show that the webSWM method is promising in scaling knowledge for intelligent surgical systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Lowering the Barrier to Reproducible Research by Publishing Provenance from Common Analytical Tools

NASA Astrophysics Data System (ADS)

Jones, M. B.; Slaughter, P.; Walker, L.; Jones, C. S.; Missier, P.; Ludäscher, B.; Cao, Y.; McPhillips, T.; Schildhauer, M.

2015-12-01

Scientific provenance describes the authenticity, origin, and processing history of research products and promotes scientific transparency by detailing the steps in computational workflows that produce derived products. These products include papers, findings, input data, software products to perform computations, and derived data and visualizations. The geosciences community values this type of information, and, at least theoretically, strives to base conclusions on computationally replicable findings. In practice, capturing detailed provenance is laborious and thus has been a low priority; beyond a lab notebook describing methods and results, few researchers capture and preserve detailed records of scientific provenance. We have built tools for capturing and publishing provenance that integrate into analytical environments that are in widespread use by geoscientists (R and Matlab). These tools lower the barrier to provenance generation by automating capture of critical information as researchers prepare data for analysis, develop, test, and execute models, and create visualizations. The 'recordr' library in R and the `matlab-dataone` library in Matlab provide shared functions to capture provenance with minimal changes to normal working procedures. Researchers can capture both scripted and interactive sessions, tag and manage these executions as they iterate over analyses, and then prune and publish provenance metadata and derived products to the DataONE federation of archival repositories. Provenance traces conform to the ProvONE model extension of W3C PROV, enabling interoperability across tools and languages. The capture system supports fine-grained versioning of science products and provenance traces. By assigning global identifiers such as DOIs, reseachers can cite the computational processes used to reach findings. And, finally, DataONE has built a web portal to search, browse, and clearly display provenance relationships between input data, the software used to execute analyses and models, and derived data and products that arise from these computations. This provenance is vital to interpretation and understanding of science, and provides an audit trail that researchers can use to understand and replicate computational workflows in the geosciences.
Modernizing Earth and Space Science Modeling Workflows in the Big Data Era

NASA Astrophysics Data System (ADS)

Kinter, J. L.; Feigelson, E.; Walker, R. J.; Tino, C.

2017-12-01

Modeling is a major aspect of the Earth and space science research. The development of numerical models of the Earth system, planetary systems or astrophysical systems is essential to linking theory with observations. Optimal use of observations that are quite expensive to obtain and maintain typically requires data assimilation that involves numerical models. In the Earth sciences, models of the physical climate system are typically used for data assimilation, climate projection, and inter-disciplinary research, spanning applications from analysis of multi-sensor data sets to decision-making in climate-sensitive sectors with applications to ecosystems, hazards, and various biogeochemical processes. In space physics, most models are from first principles, require considerable expertise to run and are frequently modified significantly for each case study. The volume and variety of model output data from modeling Earth and space systems are rapidly increasing and have reached a scale where human interaction with data is prohibitively inefficient. A major barrier to progress is that modeling workflows isn't deemed by practitioners to be a design problem. Existing workflows have been created by a slow accretion of software, typically based on undocumented, inflexible scripts haphazardly modified by a succession of scientists and students not trained in modern software engineering methods. As a result, existing modeling workflows suffer from an inability to onboard new datasets into models; an inability to keep pace with accelerating data production rates; and irreproducibility, among other problems. These factors are creating an untenable situation for those conducting and supporting Earth system and space science. Improving modeling workflows requires investments in hardware, software and human resources. This paper describes the critical path issues that must be targeted to accelerate modeling workflows, including script modularization, parallelization, and automation in the near term, and longer term investments in virtualized environments for improved scalability, tolerance for lossy data compression, novel data-centric memory and storage technologies, and tools for peer reviewing, preserving and sharing workflows, as well as fundamental statistical and machine learning algorithms.
Build and Execute Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guan, Qiang

At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows willmore » compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.« less
Observing System Simulation Experiment (OSSE) for the HyspIRI Spectrometer Mission

NASA Technical Reports Server (NTRS)

Turmon, Michael J.; Block, Gary L.; Green, Robert O.; Hua, Hook; Jacob, Joseph C.; Sobel, Harold R.; Springer, Paul L.; Zhang, Qingyuan

2010-01-01

The OSSE software provides an integrated end-to-end environment to simulate an Earth observing system by iteratively running a distributed modeling workflow based on the HyspIRI Mission, including atmospheric radiative transfer, surface albedo effects, detection, and retrieval for agile exploration of the mission design space. The software enables an Observing System Simulation Experiment (OSSE) and can be used for design trade space exploration of science return for proposed instruments by modeling the whole ground truth, sensing, and retrieval chain and to assess retrieval accuracy for a particular instrument and algorithm design. The OSSE in fra struc ture is extensible to future National Research Council (NRC) Decadal Survey concept missions where integrated modeling can improve the fidelity of coupled science and engineering analyses for systematic analysis and science return studies. This software has a distributed architecture that gives it a distinct advantage over other similar efforts. The workflow modeling components are typically legacy computer programs implemented in a variety of programming languages, including MATLAB, Excel, and FORTRAN. Integration of these diverse components is difficult and time-consuming. In order to hide this complexity, each modeling component is wrapped as a Web Service, and each component is able to pass analysis parameterizations, such as reflectance or radiance spectra, on to the next component downstream in the service workflow chain. In this way, the interface to each modeling component becomes uniform and the entire end-to-end workflow can be run using any existing or custom workflow processing engine. The architecture lets users extend workflows as new modeling components become available, chain together the components using any existing or custom workflow processing engine, and distribute them across any Internet-accessible Web Service endpoints. The workflow components can be hosted on any Internet-accessible machine. This has the advantages that the computations can be distributed to make best use of the available computing resources, and each workflow component can be hosted and maintained by their respective domain experts.
Digital transformation in home care. A case study.

PubMed

Bennis, Sandy; Costanzo, Diane; Flynn, Ann Marie; Reidy, Agatha; Tronni, Catherine

2007-01-01

Simply implementing software and technology does not assure that an organization's targeted clinical and financial goals will be realized. No longer is it possible to roll out a new system--by solely providing end user training and overlaying it on top of already inefficient workflows and outdated roles--and know with certainty that targets will be met. At Virtua Health's Home Care, based in south New Jersey, implementation of their electronic system initially followed this more traditional approach. Unable to completely attain their earlier identified return on investment, they enlisted the help of a new role within their health system, that of the nurse informaticist. Knowledgeable in complex clinical processes and not bound by the technology at hand, the informaticist analyzed physical workflow, digital workflow, roles and physical layout. Leveraging specific tools such as change acceleration, workouts and LEAN, the informaticist was able to redesign workflow and support new levels of functionality. This article provides a view from the "finish line", recounting how this role worked with home care to assimilate information delivery into more efficient processes and align resources to support the new workflow, ultimately achieving real tangible returns.

Big Data Provenance: Challenges, State of the Art and Opportunities.

PubMed

Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay

2015-01-01

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.
Managing and Communicating Operational Workflow: Designing and Implementing an Electronic Outpatient Whiteboard.

PubMed

Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M

2016-01-01

Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.
Integrating visualization and interaction research to improve scientific workflows.

PubMed

Keefe, Daniel F

2010-01-01

Scientific-visualization research is, nearly by necessity, interdisciplinary. In addition to their collaborators in application domains (for example, cell biology), researchers regularly build on close ties with disciplines related to visualization, such as graphics, human-computer interaction, and cognitive science. One of these ties is the connection between visualization and interaction research. This isn't a new direction for scientific visualization (see the "Early Connections" sidebar). However, momentum recently seems to be increasing toward integrating visualization research (for example, effective visual presentation of data) with interaction research (for example, innovative interactive techniques that facilitate manipulating and exploring data). We see evidence of this trend in several places, including the visualization literature and conferences.
Facebook for scientists: requirements and services for optimizing how scientific collaborations are established.

PubMed

Schleyer, Titus; Spallek, Heiko; Butler, Brian S; Subramanian, Sushmita; Weiss, Daniel; Poythress, M Louisa; Rattanathikun, Phijarana; Mueller, Gregory

2008-08-13

As biomedical research projects become increasingly interdisciplinary and complex, collaboration with appropriate individuals, teams, and institutions becomes ever more crucial to project success. While social networks are extremely important in determining how scientific collaborations are formed, social networking technologies have not yet been studied as a tool to help form scientific collaborations. Many currently emerging expertise locating systems include social networking technologies, but it is unclear whether they make the process of finding collaborators more efficient and effective. This study was conducted to answer the following questions: (1) Which requirements should systems for finding collaborators in biomedical science fulfill? and (2) Which information technology services can address these requirements? The background research phase encompassed a thorough review of the literature, affinity diagramming, contextual inquiry, and semistructured interviews. This phase yielded five themes suggestive of requirements for systems to support the formation of collaborations. In the next phase, the generative phase, we brainstormed and selected design ideas for formal concept validation with end users. Then, three related, well-validated ideas were selected for implementation and evaluation in a prototype. Five main themes of systems requirements emerged: (1) beyond expertise, successful collaborations require compatibility with respect to personality, work style, productivity, and many other factors (compatibility); (2) finding appropriate collaborators requires the ability to effectively search in domains other than your own using information that is comprehensive and descriptive (communication); (3) social networks are important for finding potential collaborators, assessing their suitability and compatibility, and establishing contact with them (intermediation); (4) information profiles must be complete, correct, up-to-date, and comprehensive and allow fine-grained control over access to information by different audiences (information quality and access); (5) keeping online profiles up-to-date should require little or no effort and be integrated into the scientist's existing workflow (motivation). Based on the requirements, 16 design ideas underwent formal validation with end users. Of those, three were chosen to be implemented and evaluated in a system prototype, "Digital|Vita": maintaining, formatting, and semi-automated updating of biographical information; searching for experts; and building and maintaining the social network and managing document flow. In addition to quantitative and factual information about potential collaborators, social connectedness, personal and professional compatibility, and power differentials also influence whether collaborations are formed. Current systems only partially model these requirements. Services in Digital|Vita combine an existing workflow, maintaining and formatting biographical information, with collaboration-searching functions in a novel way. Several barriers to the adoption of systems such as Digital|Vita exist, such as potential adoption asymmetries between junior and senior researchers and the tension between public and private information. Developers and researchers may consider one or more of the services described in this paper for implementation in their own expertise locating systems.
[Change in process management by implementing RIS, PACS and flat-panel detectors].

PubMed

Imhof, H; Dirisamer, A; Fischer, H; Grampp, S; Heiner, L; Kaderk, M; Krestan, C; Kainberger, F

2002-05-01

Implementation of radiological information systems (RIS) and picture archiving and communicating systems (PACS) results in significant changes of workflow in a radiological department. Additional connection with flat-panel detectors leads to a shortening of the work process. RIS and PACS implementation alone reduces the complete workflow by 21-80%. With flatpanel technology the image production process is further shortened by 25-30%. The workflow-steps are changed from original 17-12 with the implementation of RIS and PACS and to 5 with the integrated use of flatpanels. This clearly recognizable advantages in the workflow need an according financial investment. Several studies could show that the capitalisation-factor calculated over eight years is positive, with a gain range between 5-25%. Whether the additional implementation of flatpanel detectors results also in a positive capitalisation over the years, cannot be estimated exactly, at the moment, because the experiences are too short. Particularly critical are the interfaces, which needs a constant quality control. Our flatpanel detector-system is fixed, special images--as we have them in about 3-5% of all cases--need still conventional filmscreen or phosphorplate-systems. Full-spine and long-leg examinations cannot be performed with sufficient exactness. Without any questions implementation of integrated RIS, PACS and flatpanel detector-system needs excellent training of the employees, because of the changes in workflow etc. The main profits of such an integrated implementation are an increase in quality in image and report datas, easier handling--there are almost no more cassettes necessary--and excessive shortening of workflow.
Workflow Automation: A Collective Case Study

ERIC Educational Resources Information Center

Harlan, Jennifer

2013-01-01

Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…
Design Principles as a Guide for Constraint Based and Dynamic Modeling: Towards an Integrative Workflow.

PubMed

Sehr, Christiana; Kremling, Andreas; Marin-Sanguino, Alberto

2015-10-16

During the last 10 years, systems biology has matured from a fuzzy concept combining omics, mathematical modeling and computers into a scientific field on its own right. In spite of its incredible potential, the multilevel complexity of its objects of study makes it very difficult to establish a reliable connection between data and models. The great number of degrees of freedom often results in situations, where many different models can explain/fit all available datasets. This has resulted in a shift of paradigm from the initially dominant, maybe naive, idea of inferring the system out of a number of datasets to the application of different techniques that reduce the degrees of freedom before any data set is analyzed. There is a wide variety of techniques available, each of them can contribute a piece of the puzzle and include different kinds of experimental information. But the challenge that remains is their meaningful integration. Here we show some theoretical results that enable some of the main modeling approaches to be applied sequentially in a complementary manner, and how this workflow can benefit from evolutionary reasoning to keep the complexity of the problem in check. As a proof of concept, we show how the synergies between these modeling techniques can provide insight into some well studied problems: Ammonia assimilation in bacteria and an unbranched linear pathway with end-product inhibition.
Improving adherence to the Epic Beacon ambulatory workflow.

PubMed

Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana

2017-06-01

Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

NASA Astrophysics Data System (ADS)

Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.

2015-05-01

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
In-database processing of a large collection of remote sensing data: applications and implementation

NASA Astrophysics Data System (ADS)

Kikhtenko, Vladimir; Mamash, Elena; Chubarov, Dmitri; Voronina, Polina

2016-04-01

Large archives of remote sensing data are now available to scientists, yet the need to work with individual satellite scenes or product files constrains studies that span a wide temporal range or spatial extent. The resources (storage capacity, computing power and network bandwidth) required for such studies are often beyond the capabilities of individual geoscientists. This problem has been tackled before in remote sensing research and inspired several information systems. Some of them such as NASA Giovanni [1] and Google Earth Engine have already proved their utility for science. Analysis tasks involving large volumes of numerical data are not unique to Earth Sciences. Recent advances in data science are enabled by the development of in-database processing engines that bring processing closer to storage, use declarative query languages to facilitate parallel scalability and provide high-level abstraction of the whole dataset. We build on the idea of bridging the gap between file archives containing remote sensing data and databases by integrating files into relational database as foreign data sources and performing analytical processing inside the database engine. Thereby higher level query language can efficiently address problems of arbitrary size: from accessing the data associated with a specific pixel or a grid cell to complex aggregation over spatial or temporal extents over a large number of individual data files. This approach was implemented using PostgreSQL for a Siberian regional archive of satellite data products holding hundreds of terabytes of measurements from multiple sensors and missions taken over a decade-long span. While preserving the original storage layout and therefore compatibility with existing applications the in-database processing engine provides a toolkit for provisioning remote sensing data in scientific workflows and applications. The use of SQL - a widely used higher level declarative query language - simplifies interoperability between desktop GIS, web applications and geographic web services and interactive scientific applications (MATLAB, IPython). The system is also automatically ingesting direct readout data from meteorological and research satellites in near-real time with distributed acquisition workflows managed by Taverna workflow engine [2]. The system has demonstrated its utility in performing non-trivial analytic processing such as the computation of the Robust Satellite Technique (RST) indices [3]. It had been useful in different tasks such as studying urban heat islands, analyzing patterns in the distribution of wildfire occurrences, detecting phenomena related to seismic and earthquake activity. Initial experience has highlighted several limitations of the proposed approach yet it has demonstrated ability to facilitate the use of large archives of remote sensing data by geoscientists. 1. J.G. Acker, G. Leptoukh, Online analysis enhances use of NASA Earth science data. EOS Trans. AGU, 2007, 88(2), P. 14-17. 2. D. Hull, K. Wolsfencroft, R. Stevens, C. Goble, M.R. Pocock, P. Li and T. Oinn, Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006. V. 34. P. W729-W732. 3. V. Tramutoli, G. Di Bello, N. Pergola, S. Piscitelli, Robust satellite techniques for remote sensing of seismically active areas // Annals of Geophysics. 2001. no. 44(2). P. 295-312.
Computer imaging and workflow systems in the business office.

PubMed

Adams, W T; Veale, F H; Helmick, P M

1999-05-01

Computer imaging and workflow technology automates many business processes that currently are performed using paper processes. Documents are scanned into the imaging system and placed in electronic patient account folders. Authorized users throughout the organization, including preadmission, verification, admission, billing, cash posting, customer service, and financial counseling staff, have online access to the information they need when they need it. Such streamlining of business functions can increase collections and customer satisfaction while reducing labor, supply, and storage costs. Because the costs of a comprehensive computer imaging and workflow system can be considerable, healthcare organizations should consider implementing parts of such systems that can be cost-justified or include implementation as part of a larger strategic technology initiative.
How to Take HRMS Process Management to the Next Level with Workflow Business Event System

NASA Technical Reports Server (NTRS)

Rajeshuni, Sarala; Yagubian, Aram; Kunamaneni, Krishna

2006-01-01

Oracle Workflow with the Business Event System offers a complete process management solution for enterprises to manage business processes cost-effectively. Using Workflow event messaging, event subscriptions, AQ Servlet and advanced queuing technologies, this presentation will demonstrate the step-by-step design and implementation of system solutions in order to integrate two dissimilar systems and establish communication remotely. As a case study, the presentation walks you through the process of propagating organization name changes in other applications that originated from the HRMS module without changing applications code. The solution can be applied to your particular business cases for streamlining or modifying business processes across Oracle and non-Oracle applications.
Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

PubMed Central

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

PubMed

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
Implementing for Sustainability: Promoting Use of a Measurement Feedback System for Innovation and Quality Improvement.

PubMed

Douglas, Susan; Button, Suzanne; Casey, Susan E

2016-05-01

Measurement feedback systems (MFSs) are increasingly recognized as evidence-based treatments for improving mental health outcomes, in addition to being a useful administrative tool for service planning and reporting. Promising research findings have driven practice administrators and policymakers to emphasize the incorporation of outcomes monitoring into electronic health systems. To promote MFS integrity and protect against potentially negative outcomes, it is vital that adoption and implementation be guided by scientifically rigorous yet practical principles. In this point of view, the authors discuss and provide examples of three user-centered and theory-based principles: emphasizing integration with clinical values and workflow, promoting administrative leadership with the 'golden thread' of data-informed decision-making, and facilitating sustainability by encouraging innovation. In our experience, enacting these principles serves to promote sustainable implementation of MFSs in the community while also allowing innovation to occur, which can inform improvements to guide future MFS research.
The markup is the model: reasoning about systems biology models in the Semantic Web era.

PubMed

Kell, Douglas B; Mendes, Pedro

2008-06-07

Metabolic control analysis, co-invented by Reinhart Heinrich, is a formalism for the analysis of biochemical networks, and is a highly important intellectual forerunner of modern systems biology. Exchanging ideas and exchanging models are part of the international activities of science and scientists, and the Systems Biology Markup Language (SBML) allows one to perform the latter with great facility. Encoding such models in SBML allows their distributed analysis using loosely coupled workflows, and with the advent of the Internet the various software modules that one might use to analyze biochemical models can reside on entirely different computers and even on different continents. Optimization is at the core of many scientific and biotechnological activities, and Reinhart made many major contributions in this area, stimulating our own activities in the use of the methods of evolutionary computing for optimization.
Impact of digital radiography on clinical workflow.

PubMed

May, G A; Deer, D D; Dackiewicz, D

2000-05-01

It is commonly accepted that digital radiography (DR) improves workflow and patient throughput compared with traditional film radiography or computed radiography (CR). DR eliminates the film development step and the time to acquire the image from a CR reader. In addition, the wide dynamic range of DR is such that the technologist can perform the quality-control (QC) step directly at the modality in a few seconds, rather than having to transport the newly acquired image to a centralized QC station for review. Furthermore, additional workflow efficiencies can be achieved with DR by employing tight radiology information system (RIS) integration. In the DR imaging environment, this provides for patient demographic information to be automatically downloaded from the RIS to populate the DR Digital Imaging and Communications in Medicine (DICOM) image header. To learn more about this workflow efficiency improvement, we performed a comparative study of workflow steps under three different conditions: traditional film/screen x-ray, DR without RIS integration (ie, manual entry of patient demographics), and DR with RIS integration. This study was performed at the Cleveland Clinic Foundation (Cleveland, OH) using a newly acquired amorphous silicon flat-panel DR system from Canon Medical Systems (Irvine, CA). Our data show that DR without RIS results in substantial workflow savings over traditional film/screen practice. There is an additional 30% reduction in total examination time using DR with RIS integration.
Biomedical text mining and its applications in cancer research.

PubMed

Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

2013-04-01

Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
Contextual cloud-based service oriented architecture for clinical workflow.

PubMed

Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

2015-01-01

Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.
A standard-enabled workflow for synthetic biology.

PubMed

Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

2017-06-15

A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

Workflow computing. Improving management and efficiency of pathology diagnostic services.

PubMed

Buffone, G J; Moreau, D; Beck, J R

1996-04-01

Traditionally, information technology in health care has helped practitioners to collect, store, and present information and also to add a degree of automation to simple tasks (instrument interfaces supporting result entry, for example). Thus commercially available information systems do little to support the need to model, execute, monitor, coordinate, and revise the various complex clinical processes required to support health-care delivery. Workflow computing, which is already implemented and improving the efficiency of operations in several nonmedical industries, can address the need to manage complex clinical processes. Workflow computing not only provides a means to define and manage the events, roles, and information integral to health-care delivery but also supports the explicit implementation of policy or rules appropriate to the process. This article explains how workflow computing may be applied to health-care and the inherent advantages of the technology, and it defines workflow system requirements for use in health-care delivery with special reference to diagnostic pathology.
SU-E-J-73: Extension of a Clinical OIS/EMR/R&V System to Deliver Safe and Efficient Adaptive Plan-Of-The-Day Treatments Using a Fully Customizable Plan-Library-Based Workflow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Akhiat, A.; Elekta, Sunnyvale, CA; Kanis, A.P.

Purpose: To extend a clinical Record and Verify (R&V) system to enable a safe and fast workflow for Plan-of-the-Day (PotD) adaptive treatments based on patient-specific plan libraries. Methods: Plan libraries for PotD adaptive treatments contain for each patient several pre-treatment generated treatment plans. They may be generated for various patient anatomies or CTV-PTV margins. For each fraction, a Cone Beam CT scan is acquired to support the selection of the plan that best fits the patient’s anatomy-of-the-day. To date, there are no commercial R&V systems that support PotD delivery strategies. Consequently, the clinical workflow requires many manual interventions. Moreover, multiplemore » scheduled plans have a high risk of excessive dose delivery. In this work we extended a commercial R&V system (MOSAIQ) to support PotD workflows using IQ-scripting. The PotD workflow was designed after extensive risk analysis of the manual procedure, and all identified risks were incorporated as logical checks. Results: All manual PotD activities were automated. The workflow first identifies if the patient is scheduled for PotD, then performs safety checks, and continues to treatment plan selection only if no issues were found. The user selects the plan to deliver from a list of candidate plans. After plan selection, the workflow makes the treatment fields of the selected plan available for delivery by adding them to the treatment calendar. Finally, control is returned to the R&V system to commence treatment. Additional logic was added to incorporate off-line changes such as updating the plan library. After extensive testing including treatment fraction interrupts and plan-library updates during the treatment course, the workflow is running successfully in a clinical pilot, in which 35 patients have been treated since October 2014. Conclusion: We have extended a commercial R&V system for improved safety and efficiency in library-based adaptive strategies enabling a wide-spread implementation of those strategies. This work was in part funded by a research grant of Elekta AB, Stockholm, Sweden.« less
A Scientific Data Provenance API for Distributed Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raju, Bibi; Elsethagen, Todd O.; Stephan, Eric G.

Data provenance has been an active area of research as a means to standardize how the origin of data, process event history, and what or who was responsible for influencing results is explained. There are two approaches to capture provenance information. The first approach is to collect observed evidence produced by an executing application using log files, event listeners, and temporary files that are used by the application or application developer. The provenance translated from these observations is an interpretation of the provided evidence. The second approach is called disclosed because the application provides a firsthand account of the provenancemore » based on the anticipated questions on data flow, process flow, and responsible agents. Most observed provenance collection systems collect lot of provenance information during an application run or workflow execution. The common trend in capturing provenance is to collect all possible information, then attempt to find relevant information, which is not efficient. Existing disclosed provenance system APIs do not work well in distributed environment and have trouble finding where to fit the individual pieces of provenance information. This work focuses on determining more reliable solutions for provenance capture. As part of the Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project, an API was developed, called Producer API (PAPI), which can disclose application targeted provenance, designed to work in distributed environments by means of unique object identification methods. The provenance disclosure approach used adds additional metadata to the provenance information to uniquely identify the pieces and connect them together. PAPI uses a common provenance model to support this provenance integration across disclosure sources. The API also provides the flexibility to let the user decide what to do with the collected provenance. The collected provenance can be sent to a triple store using REST services or it can be logged to a file.« less
Schedule-Aware Workflow Management Systems

NASA Astrophysics Data System (ADS)

Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.
Nexus: A modular workflow management system for quantum simulation codes

NASA Astrophysics Data System (ADS)

Krogel, Jaron T.

2016-01-01

The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.
Managing and Communicating Operational Workflow

PubMed Central

Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

2016-01-01

Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407
From Provenance Standards and Tools to Queries and Actionable Provenance

NASA Astrophysics Data System (ADS)

Ludaescher, B.

2017-12-01

The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Dean N.; Silva, Claudio

2013-09-30

For the past three years, a large analysis and visualization effort—funded by the Department of Energy’s Office of Biological and Environmental Research (BER), the National Aeronautics and Space Administration (NASA), and the National Oceanic and Atmospheric Administration (NOAA)—has brought together a wide variety of industry-standard scientific computing libraries and applications to create Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) to serve the global climate simulation and observational research communities. To support interactive analysis and visualization, all components connect through a provenance application–programming interface to capture meaningful history and workflow. Components can be loosely coupled into the framework for fast integrationmore » or tightly coupled for greater system functionality and communication with other components. The overarching goal of UV-CDAT is to provide a new paradigm for access to and analysis of massive, distributed scientific data collections by leveraging distributed data architectures located throughout the world. The UV-CDAT framework addresses challenges in analysis and visualization and incorporates new opportunities, including parallelism for better efficiency, higher speed, and more accurate scientific inferences. Today, it provides more than 600 users access to more analysis and visualization products than any other single source.« less
Standardizing clinical trials workflow representation in UML for international site comparison.

PubMed

de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M O; Rodrigues, Maria J; Shah, Jatin; Loures, Marco R; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

2010-11-09

With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows.
Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison

PubMed Central

de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

2010-01-01

Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows. PMID:21085484
Financial and workflow analysis of radiology reporting processes in the planning phase of implementation of a speech recognition system

NASA Astrophysics Data System (ADS)

Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.

2002-05-01

The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.
The TMT instrumentation program

NASA Astrophysics Data System (ADS)

Simard, Luc; Crampton, David; Ellerbroek, Brent; Boyer, Corinne

2010-07-01

An overview of the current status of the Thirty Meter Telescope (TMT) instrumentation program is presented. Conceptual designs for the three first light instruments (IRIS, WFOS and IRMS) are in progress, as well as feasibility studies of MIRES. Considerable effort is underway to understand the end-to-end performance of the complete telescopeadaptive optics-instrument system under realistic conditions on Mauna Kea. Highly efficient operation is being designed into the TMT system, based on a detailed investigation of the observation workflow to ensure very fast target acquisition and set up of all subsystems. Future TMT instruments will almost certainly involve contributions from institutions in many different locations in North America and partner nations. Coordinating and optimizing the design and construction of the instruments to ensure delivery of the best possible scientific capabilities is an interesting challenge. TMT welcomes involvement from all interested instrument teams.
Managing bioengineering complexity with AI techniques.

PubMed

Beal, Jacob; Adler, Aaron; Yaman, Fusun

2016-10-01

Our capabilities for systematic design and engineering of biological systems are rapidly increasing. Effectively engineering such systems, however, requires the synthesis of a rapidly expanding and changing complex body of knowledge, protocols, and methodologies. Many of the problems in managing this complexity, however, appear susceptible to being addressed by artificial intelligence (AI) techniques, i.e., methods enabling computers to represent, acquire, and employ knowledge. Such methods can be employed to automate physical and informational "routine" work and thus better allow humans to focus their attention on the deeper scientific and engineering issues. This paper examines the potential impact of AI on the engineering of biological organisms through the lens of a typical organism engineering workflow. We identify a number of key opportunities for significant impact, as well as challenges that must be overcome. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
iTOUGH2: A multiphysics simulation-optimization framework for analyzing subsurface systems

NASA Astrophysics Data System (ADS)

Finsterle, S.; Commer, M.; Edmiston, J. K.; Jung, Y.; Kowalsky, M. B.; Pau, G. S. H.; Wainwright, H. M.; Zhang, Y.

2017-11-01

iTOUGH2 is a simulation-optimization framework for the TOUGH suite of nonisothermal multiphase flow models and related simulators of geophysical, geochemical, and geomechanical processes. After appropriate parameterization of subsurface structures and their properties, iTOUGH2 runs simulations for multiple parameter sets and analyzes the resulting output for parameter estimation through automatic model calibration, local and global sensitivity analyses, data-worth analyses, and uncertainty propagation analyses. Development of iTOUGH2 is driven by scientific challenges and user needs, with new capabilities continually added to both the forward simulator and the optimization framework. This review article provides a summary description of methods and features implemented in iTOUGH2, and discusses the usefulness and limitations of an integrated simulation-optimization workflow in support of the characterization and analysis of complex multiphysics subsurface systems.
Evolution of Medication Administration Workflow in Implementing Electronic Health Record System

ERIC Educational Resources Information Center

Huang, Yuan-Han

2013-01-01

This study focused on the clinical workflow evolutions when implementing the health information technology (HIT). The study especially emphasized on administrating medication when the electronic health record (EHR) systems were adopted at rural healthcare facilities. Mixed-mode research methods, such as survey, observation, and focus group, were…
SU-F-T-251: The Quality Assurance for the Heavy Patient Load Department in the Developing Country: The Primary Experience of An Entire Workflow QA Process Management in Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, J; Wang, J; Peng, J

Purpose: To implement an entire workflow quality assurance (QA) process in the radiotherapy department and to reduce the error rates of radiotherapy based on the entire workflow management in the developing country. Methods: The entire workflow QA process management starts from patient registration to the end of last treatment including all steps through the entire radiotherapy process. Error rate of chartcheck is used to evaluate the the entire workflow QA process. Two to three qualified senior medical physicists checked the documents before the first treatment fraction of every patient. Random check of the treatment history during treatment was also performed.more » A total of around 6000 patients treatment data before and after implementing the entire workflow QA process were compared from May, 2014 to December, 2015. Results: A systemic checklist was established. It mainly includes patient’s registration, treatment plan QA, information exporting to OIS(Oncology Information System), documents of treatment QAand QA of the treatment history. The error rate derived from the chart check decreases from 1.7% to 0.9% after our the entire workflow QA process. All checked errors before the first treatment fraction were corrected as soon as oncologist re-confirmed them and reinforce staff training was accordingly followed to prevent those errors. Conclusion: The entire workflow QA process improved the safety, quality of radiotherapy in our department and we consider that our QA experience can be applicable for the heavily-loaded radiotherapy departments in developing country.« less
Proving Value in Radiology: Experience Developing and Implementing a Shareable Open Source Registry Platform Driven by Radiology Workflow.

PubMed

Gichoya, Judy Wawira; Kohli, Marc D; Haste, Paul; Abigail, Elizabeth Mills; Johnson, Matthew S

2017-10-01

Numerous initiatives are in place to support value based care in radiology including decision support using appropriateness criteria, quality metrics like radiation dose monitoring, and efforts to improve the quality of the radiology report for consumption by referring providers. These initiatives are largely data driven. Organizations can choose to purchase proprietary registry systems, pay for software as a service solution, or deploy/build their own registry systems. Traditionally, registries are created for a single purpose like radiation dosage or specific disease tracking like diabetes registry. This results in a fragmented view of the patient, and increases overhead to maintain such single purpose registry system by requiring an alternative data entry workflow and additional infrastructure to host and maintain multiple registries for different clinical needs. This complexity is magnified in the health care enterprise whereby radiology systems usually are run parallel to other clinical systems due to the different clinical workflow for radiologists. In the new era of value based care where data needs are increasing with demand for a shorter turnaround time to provide data that can be used for information and decision making, there is a critical gap to develop registries that are more adapt to the radiology workflow with minimal overhead on resources for maintenance and setup. We share our experience of developing and implementing an open source registry system for quality improvement and research in our academic institution that is driven by our radiology workflow.
Unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in the outpatient pharmacy.

PubMed

Nanji, Karen C; Rothschild, Jeffrey M; Boehne, Jennifer J; Keohane, Carol A; Ash, Joan S; Poon, Eric G

2014-01-01

Electronic prescribing systems have often been promoted as a tool for reducing medication errors and adverse drug events. Recent evidence has revealed that adoption of electronic prescribing systems can lead to unintended consequences such as the introduction of new errors. The purpose of this study is to identify and characterize the unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in an outpatient pharmacy. A multidisciplinary team conducted direct observations of workflow in an independent pharmacy and semi-structured interviews with pharmacy staff members about their perceptions of the unrealized potential and residual consequences of electronic prescribing systems. We used qualitative methods to iteratively analyze text data using a grounded theory approach, and derive a list of major themes and subthemes related to the unrealized potential and residual consequences of electronic prescribing. We identified the following five themes: Communication, workflow disruption, cost, technology, and opportunity for new errors. These contained 26 unique subthemes representing different facets of our observations and the pharmacy staff's perceptions of the unrealized potential and residual consequences of electronic prescribing. We offer targeted solutions to improve electronic prescribing systems by addressing the unrealized potential and residual consequences that we identified. These recommendations may be applied not only to improve staff perceptions of electronic prescribing systems but also to improve the design and/or selection of these systems in order to optimize communication and workflow within pharmacies while minimizing both cost and the potential for the introduction of new errors.
An Integrated Cyberenvironment for Event-Driven Environmental Observatory Research and Education

NASA Astrophysics Data System (ADS)

Myers, J.; Minsker, B.; Butler, R.

2006-12-01

National environmental observatories will soon provide large-scale data from diverse sensor networks and community models. While much attention is focused on piping data from sensors to archives and users, truly integrating these resources into the everyday research activities of scientists and engineers across the community, and enabling their results and innovations to be brought back into the observatory, also critical to long-term success of the observatories, is often neglected. This talk will give an overview of the Environmental Cyberinfrastructure Demonstrator (ECID) Cyberenvironment for observatory-centric environmental research and education, under development at the National Center for Supercomputing Applications (NCSA), which is designed to address these issues. Cyberenvironments incorporate collaboratory and grid technologies, web services, and other cyberinfrastructure into an overall framework that balances needs for efficient coordination and the ability to innovate. They are designed to support the full scientific lifecycle both in terms of individual experiments moving from data to workflows to publication and at the macro level where new discoveries lead to additional data, models, tools, and conceptual frameworks that augment and evolve community-scale systems such as observatories. The ECID cyberenvironment currently integrates five major components a collaborative portal, workflow engine, event manager, metadata repository, and social network personalization capabilities - that have novel features inspired by the Cyberenvironment concept and enabling powerful environmental research scenarios. A summary of these components and the overall cyberenvironment will be given in this talk, while other posters will give details on several of the components. The summary will be presented within the context of environmental use case scenarios created in collaboration with researchers from the WATERS (WATer and Environmental Research Systems) Network, a joint National Science Foundation-funded initiative of the hydrology and environmental engineering communities. The use case scenarios include identifying sensor anomalies in point- and streaming sensor data and notifying data managers in near-real time; and referring users of data or data products (e.g., workflows, publications) to related data or data products.
Developing science gateways for drug discovery in a grid environment.

PubMed

Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra

2016-01-01

Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.

Big Data Provenance: Challenges, State of the Art and Opportunities

PubMed Central

Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay

2017-01-01

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data. PMID:29399671
Reply to comment by Añel on "Most computational hydrology is not reproducible, so is it really science?"

NASA Astrophysics Data System (ADS)

Hutton, Christopher; Wagener, Thorsten; Freer, Jim; Han, Dawei; Duffy, Chris; Arheimer, Berit

2017-03-01

In this article, we reply to a comment made on our previous commentary regarding reproducibility in computational hydrology. Software licensing and version control of code are important technical aspects of making code and workflows of scientific experiments open and reproducible. However, in our view, it is the cultural change that is the greatest challenge to overcome to achieve reproducible scientific research in computational hydrology. We believe that from changing the culture and attitude among hydrological scientists, details will evolve to cover more (technical) aspects over time.
ObsPy: A Python Toolbox for Seismology

NASA Astrophysics Data System (ADS)

Wassermann, J. M.; Krischer, L.; Megies, T.; Barsch, R.; Beyreuther, M.

2013-12-01

Python combines the power of a full-blown programming language with the flexibility and accessibility of an interactive scripting language. Its extensive standard library and large variety of freely available high quality scientific modules cover most needs in developing scientific processing workflows. ObsPy is a community-driven, open-source project extending Python's capabilities to fit the specific needs that arise when working with seismological data. It a) comes with a continuously growing signal processing toolbox that covers most tasks common in seismological analysis, b) provides read and write support for many common waveform, station and event metadata formats and c) enables access to various data centers, webservices and databases to retrieve waveform data and station/event metadata. In combination with mature and free Python packages like NumPy, SciPy, Matplotlib, IPython, Pandas, lxml, and PyQt, ObsPy makes it possible to develop complete workflows in Python, ranging from reading locally stored data or requesting data from one or more different data centers via signal analysis and data processing to visualization in GUI and web applications, output of modified/derived data and the creation of publication-quality figures. All functionality is extensively documented and the ObsPy Tutorial and Gallery give a good impression of the wide range of possible use cases. ObsPy is tested and running on Linux, OS X and Windows and comes with installation routines for these systems. ObsPy is developed in a test-driven approach and is available under the LGPLv3 open source licence. Users are welcome to request help, report bugs, propose enhancements or contribute code via either the user mailing list or the project page on GitHub.
Automatically exposing OpenLifeData via SADI semantic Web Services.

PubMed

González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D

2014-01-01

Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Solutions for Mining Distributed Scientific Data

NASA Astrophysics Data System (ADS)

Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.

2007-12-01

Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
EPOS Data and Service Provision

NASA Astrophysics Data System (ADS)

Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt

2017-04-01

EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data collection into the EPOS metadata catalog. In parallel work proceeds on the user interface softsare, the APIs (Application Programming Interfaces) to the TCS services, the harvesting method and software, the AAAI (Authentication, Authorisation, Accounting Infrastructure) and the system manager. The next steps will involve interfaces to ICS-D (Distributed ICS i.e. facilities and services for computing, data storage, detectors and instruments for data collection etc.) to which requests, software and data will be deployed and from which data will be generated. Associated with this will be the development of the workflow system which will assist the end-user in building a workflow to achieve the scientific objectives.
Cloud parallel processing of tandem mass spectrometry based proteomics data.

PubMed

Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

2012-10-05

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
Modeling workflow to design machine translation applications for public health practice

PubMed Central

Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin

2014-01-01

Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922
System on Mobile Devices Middleware: Thinking beyond Basic Phones and PDAs

NASA Astrophysics Data System (ADS)

Prasad, Sushil K.

Several classes of emerging applications, spanning domains such as medical informatics, homeland security, mobile commerce, and scientific applications, are collaborative, and a significant portion of these will harness the capabilities of both the stable and mobile infrastructures (the “mobile grid”). Currently, it is possible to develop a collaborative application running on a collection of heterogeneous, possibly mobile, devices, each potentially hosting data stores, using existing middleware technologies such as JXTA, BREW, Compact .NET and J2ME. However, they require too many ad-hoc techniques as well as cumbersome and time-consuming programming. Our System on Mobile Devices (SyD) middleware, on the other hand, has a modular architecture that makes such application development very systematic and streamlined. The architecture supports transactions over mobile data stores, with a range of remote group invocation options and embedded interdependencies among such data store objects. The architecture further provides a persistent uniform object view, group transaction with Quality of Service (QoS) specifications, and XML vocabulary for inter-device communication. I will present the basic SyD concepts, introduce the architecture and the design of the SyD middleware and its components. We will discuss the basic performance figures of SyD components and a few SyD applications on PDAs. SyD platform has led to developments in distributed web service coordination and workflow technologies, which we will briefly discuss. There is a vital need to develop methodologies and systems to empower common users, such as computational scientists, for rapid development of such applications. Our BondFlow system enables rapid configuration and execution of workflows over web services. The small footprint of the system enables them to reside on Java-enabled handheld devices.
Scientific Utopia: An agenda for improving scientific communication (Invited)

NASA Astrophysics Data System (ADS)

Nosek, B.

2013-12-01

The scientist's primary incentive is publication. In the present culture, open practices do not increase chances of publication, and they often require additional work. Practicing the abstract scientific values of openness and reproducibility thus requires behaviors in addition to those relevant for the primary, concrete rewards. When in conflict, concrete rewards are likely to dominate over abstract ones. As a consequence, the reward structure for scientists does not encourage openness and reproducibility. This can be changed by nudging incentives to align scientific practices with scientific values. Science will benefit by creating and connecting technologies that nudge incentives while supporting and improving the scientific workflow. For example, it should be as easy to search the research literature for my topic as it is to search the Internet to find hilarious videos of cats falling off of furniture. I will introduce the Center for Open Science (http://centerforopenscience.org/) and efforts to improve openness and reproducibility such as http://openscienceframework.org/. There will be no cats.
Interactive Scripting for Analysis and Visualization of Arbitrarily Large, Disparately Located Climate Data Ensembles Using a Progressive Runtime Server

NASA Astrophysics Data System (ADS)

Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.

2017-12-01

Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.
Lessons from implementing a combined workflow-informatics system for diabetes management.

PubMed

Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C

2008-01-01

Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.
Development and Appraisal of Multiple Accounting Record System (Mars).

PubMed

Yu, H C; Chen, M C

2016-01-01

The aim of the system is to achieve simplification of workflow, reduction of recording time, and increase the income for the study hospital. The project team decided to develop a multiple accounting record system that generates the account records based on the nursing records automatically, reduces the time and effort for nurses to review the procedure and provide another note of material consumption. Three configuration files were identified to demonstrate the relationship of treatments and reimbursement items. The workflow was simplified. The nurses averagely reduced 10 minutes of daily recording time, and the reimbursement points have been increased by 7.49%. The project streamlined the workflow and provides the institute a better way in finical management.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Duro, Francisco Rodrigo; Garcia Blas, Javier; Isaila, Florin

This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.
Nexus: a modular workflow management system for quantum simulation codes

DOE PAGES

Krogel, Jaron T.

2015-08-24

The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less
Provenance Datasets Highlighting Capture Disparities

DTIC Science & Technology

2014-01-01

Vistrails [20], Taverna [21] or Kepler [6], and an OS -observing system like PASS [18]. In less granular workflow systems, the data files, scripts...run, etc. are capturable as long as they are executed within the workflow system. In more granular OS -observing systems, the actual reads, writes...rolling up” very granular information to less granular information. OS -level capture knows that a socket was opened and that data was sent to a foreign
JACOB: an enterprise framework for computational chemistry.

PubMed

Waller, Mark P; Dresselhaus, Thomas; Yang, Jack

2013-06-15

Here, we present just a collection of beans (JACOB): an integrated batch-based framework designed for the rapid development of computational chemistry applications. The framework expedites developer productivity by handling the generic infrastructure tier, and can be easily extended by user-specific scientific code. Paradigms from enterprise software engineering were rigorously applied to create a scalable, testable, secure, and robust framework. A centralized web application is used to configure and control the operation of the framework. The application-programming interface provides a set of generic tools for processing large-scale noninteractive jobs (e.g., systematic studies), or for coordinating systems integration (e.g., complex workflows). The code for the JACOB framework is open sourced and is available at: www.wallerlab.org/jacob. Copyright © 2013 Wiley Periodicals, Inc.
A Scheduling Algorithm for the Distributed Student Registration System in Transaction-Intensive Environment

ERIC Educational Resources Information Center

Li, Wenhao

2011-01-01

Distributed workflow technology has been widely used in modern education and e-business systems. Distributed web applications have shown cross-domain and cooperative characteristics to meet the need of current distributed workflow applications. In this paper, the author proposes a dynamic and adaptive scheduling algorithm PCSA (Pre-Calculated…
Data Integration Tool: Permafrost Data Debugging

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Pulsifer, P. L.; Strawhacker, C.; Yarmey, L.; Basak, R.

2017-12-01

We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the Global Terrestrial Network-Permafrost (GTN-P). The United States National Science Foundation funded this project through the National Snow and Ice Data Center (NSIDC) with the GTN-P to improve permafrost data access and discovery. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets (https://github.com/PermaData/DIT). Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs, incrementally interact with and evolve the widget workflows, and save those workflows for reproducibility. Taking ideas from visual programming found in the art and design domain, debugging and iterative design principles from software engineering, and the scientific data processing and analysis power of Fortran and Python it was written for interactive, iterative data manipulation, quality control, processing, and analysis of inconsistent data in an easily installable application. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.
[Measures to prevent patient identification errors in blood collection/physiological function testing utilizing a laboratory information system].

PubMed

Shimazu, Chisato; Hoshino, Satoshi; Furukawa, Taiji

2013-08-01

We constructed an integrated personal identification workflow chart using both bar code reading and an all in-one laboratory information system. The information system not only handles test data but also the information needed for patient guidance in the laboratory department. The reception terminals at the entrance, displays for patient guidance and patient identification tools at blood-sampling booths are all controlled by the information system. The number of patient identification errors was greatly reduced by the system. However, identification errors have not been abolished in the ultrasound department. After re-evaluation of the patient identification process in this department, we recognized that the major reason for the errors came from excessive identification workflow. Ordinarily, an ultrasound test requires patient identification 3 times, because 3 different systems are required during the entire test process, i.e. ultrasound modality system, laboratory information system and a system for producing reports. We are trying to connect the 3 different systems to develop a one-time identification workflow, but it is not a simple task and has not been completed yet. Utilization of the laboratory information system is effective, but is not yet perfect for patient identification. The most fundamental procedure for patient identification is to ask a person's name even today. Everyday checks in the ordinary workflow and everyone's participation in safety-management activity are important for the prevention of patient identification errors.

Opal web services for biomedical applications.

PubMed

Ren, Jingyuan; Williams, Nadya; Clementi, Luca; Krishnan, Sriram; Li, Wilfred W

2010-07-01

Biomedical applications have become increasingly complex, and they often require large-scale high-performance computing resources with a large number of processors and memory. The complexity of application deployment and the advances in cluster, grid and cloud computing require new modes of support for biomedical research. Scientific Software as a Service (sSaaS) enables scalable and transparent access to biomedical applications through simple standards-based Web interfaces. Towards this end, we built a production web server (http://ws.nbcr.net) in August 2007 to support the bioinformatics application called MEME. The server has grown since to include docking analysis with AutoDock and AutoDock Vina, electrostatic calculations using PDB2PQR and APBS, and off-target analysis using SMAP. All the applications on the servers are powered by Opal, a toolkit that allows users to wrap scientific applications easily as web services without any modification to the scientific codes, by writing simple XML configuration files. Opal allows both web forms-based access and programmatic access of all our applications. The Opal toolkit currently supports SOAP-based Web service access to a number of popular applications from the National Biomedical Computation Resource (NBCR) and affiliated collaborative and service projects. In addition, Opal's programmatic access capability allows our applications to be accessed through many workflow tools, including Vision, Kepler, Nimrod/K and VisTrails. From mid-August 2007 to the end of 2009, we have successfully executed 239,814 jobs. The number of successfully executed jobs more than doubled from 205 to 411 per day between 2008 and 2009. The Opal-enabled service model is useful for a wide range of applications. It provides for interoperation with other applications with Web Service interfaces, and allows application developers to focus on the scientific tool and workflow development. Web server availability: http://ws.nbcr.net.
Barriers to effective, safe communication and workflow between nurses and non-consultant hospital doctors during out-of-hours.

PubMed

Brady, Anne-Marie; Byrne, Gobnait; Quirke, Mary Brigid; Lynch, Aine; Ennis, Shauna; Bhangu, Jaspreet; Prendergast, Meabh

2017-11-01

This study aimed to evaluate the nature and type of communication and workflow arrangements between nurses and doctors out-of-hours (OOH). Effective communication and workflow arrangements between nurses and doctors are essential to minimize risk in hospital settings, particularly in the out-of-hour's period. Timely patient flow is a priority for all healthcare organizations and the quality of communication and workflow arrangements influences patient safety. Qualitative descriptive design and data collection methods included focus groups and individual interviews. A 500 bed tertiary referral acute hospital in Ireland. Junior and senior Non-Consultant Hospital Doctors, staff nurses and nurse managers. Both nurses and doctors acknowledged the importance of good interdisciplinary communication and collaborative working, in sustaining effective workflow and enabling a supportive working environment and patient safety. Indeed, issues of safety and missed care OOH were found to be primarily due to difficulties of communication and workflow. Medical workflow OOH is often dependent on cues and communication to/from nursing. However, communication systems and, in particular the bleep system, considered central to the process of communication between doctors and nurses OOH, can contribute to workflow challenges and increased staff stress. It was reported as commonplace for routine work, that should be completed during normal hours, to fall into OOH when resources were most limited, further compounding risk to patient safety. Enhancement of communication strategies between nurses and doctors has the potential to remove barriers to effective decision-making and patient flow. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS)

NASA Astrophysics Data System (ADS)

Daniels, M. D.; Graves, S. J.; Vernon, F.; Kerkez, B.; Chandra, C. V.; Keiser, K.; Martin, C.

2014-12-01

Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) Access, utilization and management of real-time data continue to be challenging for decision makers, as well as researchers in several scientific fields. This presentation will highlight infrastructure aimed at addressing some of the gaps in handling real-time data, particularly in increasing accessibility of these data to the scientific community through cloud services. The Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) system addresses the ever-increasing importance of real-time scientific data, particularly in mission critical scenarios, where informed decisions must be made rapidly. Advances in the distribution of real-time data are leading many new transient phenomena in space-time to be observed, however real-time decision-making is infeasible in many cases that require streaming scientific data as these data are locked down and sent only to proprietary in-house tools or displays. This lack of accessibility to the broader scientific community prohibits algorithm development and workflows initiated by these data streams. As part of NSF's EarthCube initiative, CHORDS proposes to make real-time data available to the academic community via cloud services. The CHORDS infrastructure will enhance the role of real-time data within the geosciences, specifically expanding the potential of streaming data sources in enabling adaptive experimentation and real-time hypothesis testing. Adherence to community data and metadata standards will promote the integration of CHORDS real-time data with existing standards-compliant analysis, visualization and modeling tools.
The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

PubMed

Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

2012-01-01

The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.
The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows

PubMed Central

Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P.; Zijdenbos, Alex P.; Evans, Alan C.

2012-01-01

The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources. PMID:22493575
From chart tracking to workflow management.

PubMed Central

Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.

1994-01-01

The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051
Enabling Efficient Climate Science Workflows in High Performance Computing Environments

NASA Astrophysics Data System (ADS)

Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

2015-12-01

A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.
AtomPy: an open atomic-data curation environment

NASA Astrophysics Data System (ADS)

Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

2014-06-01

We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.
Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop

PubMed Central

Hatakeyama, Masaomi; Aluri, Sirisha; Balachadran, Mathi Thumilan; Sivarajan, Sajeevan Radha; Patrignani, Andrea; Grüter, Simon; Poveda, Lucy; Shimizu-Inatsugi, Rie; Baeten, John; Francoijs, Kees-Jan; Nataraja, Karaba N; Reddy, Yellodu A Nanja; Phadnis, Shamprasad; Ravikumar, Ramapura L; Schlapbach, Ralph; Sreeman, Sheshshayee M; Shimizu, Kentaro K

2018-01-01

Abstract Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length >2.6 Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes. PMID:28985356
Automated Finite State Workflow for Distributed Data Production

NASA Astrophysics Data System (ADS)

Hajdu, L.; Didenko, L.; Lauret, J.; Amol, J.; Betts, W.; Jang, H. J.; Noh, S. Y.

2016-10-01

In statistically hungry science domains, data deluges can be both a blessing and a curse. They allow the narrowing of statistical errors from known measurements, and open the door to new scientific opportunities as research programs mature. They are also a testament to the efficiency of experimental operations. However, growing data samples may need to be processed with little or no opportunity for huge increases in computing capacity. A standard strategy has thus been to share resources across multiple experiments at a given facility. Another has been to use middleware that “glues” resources across the world so they are able to locally run the experimental software stack (either natively or virtually). We describe a framework STAR has successfully used to reconstruct a ~400 TB dataset consisting of over 100,000 jobs submitted to a remote site in Korea from STAR's Tier 0 facility at the Brookhaven National Laboratory. The framework automates the full workflow, taking raw data files from tape and writing Physics-ready output back to tape without operator or remote site intervention. Through hardening we have demonstrated 97(±2)% efficiency, over a period of 7 months of operation. The high efficiency is attributed to finite state checking with retries to encourage resilience in the system over capricious and fallible infrastructure.
Nonlinear filtering techniques for noisy geophysical data: Using big data to predict the future

NASA Astrophysics Data System (ADS)

Moore, J. M.

2014-12-01

Chaos is ubiquitous in physical systems. Within the Earth sciences it is readily evident in seismology, groundwater flows and drilling data. Models and workflows have been applied successfully to understand and even to predict chaotic systems in other scientific fields, including electrical engineering, neurology and oceanography. Unfortunately, the high levels of noise characteristic of our planet's chaotic processes often render these frameworks ineffective. This contribution presents techniques for the reduction of noise associated with measurements of nonlinear systems. Our ultimate aim is to develop data assimilation techniques for forward models that describe chaotic observations, such as episodic tremor and slip (ETS) events in fault zones. A series of nonlinear filters are presented and evaluated using classical chaotic systems. To investigate whether the filters can successfully mitigate the effect of noise typical of Earth science, they are applied to sunspot data. The filtered data can be used successfully to forecast sunspot evolution for up to eight years (see figure).
Data management and data enrichment for systems biology projects.

PubMed

Wittig, Ulrike; Rey, Maja; Weidemann, Andreas; Müller, Wolfgang

2017-11-10

Collecting, curating, interlinking, and sharing high quality data are central to de.NBI-SysBio, the systems biology data management service center within the de.NBI network (German Network for Bioinformatics Infrastructure). The work of the center is guided by the FAIR principles for scientific data management and stewardship. FAIR stands for the four foundational principles Findability, Accessibility, Interoperability, and Reusability which were established to enhance the ability of machines to automatically find, access, exchange and use data. Within this overview paper we describe three tools (SABIO-RK, Excemplify, SEEK) that exemplify the contribution of de.NBI-SysBio services to FAIR data, models, and experimental methods storage and exchange. The interconnectivity of the tools and the data workflow within systems biology projects will be explained. For many years we are the German partner in the FAIRDOM initiative (http://fair-dom.org) to establish a European data and model management service facility for systems biology. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Policy Driven Development: Flexible Policy Insertion for Large Scale Systems.

PubMed

Demchak, Barry; Krüger, Ingolf

2012-07-01

The success of a software system depends critically on how well it reflects and adapts to stakeholder requirements. Traditional development methods often frustrate stakeholders by creating long latencies between requirement articulation and system deployment, especially in large scale systems. One source of latency is the maintenance of policy decisions encoded directly into system workflows at development time, including those involving access control and feature set selection. We created the Policy Driven Development (PDD) methodology to address these development latencies by enabling the flexible injection of decision points into existing workflows at runtime , thus enabling policy composition that integrates requirements furnished by multiple, oblivious stakeholder groups. Using PDD, we designed and implemented a production cyberinfrastructure that demonstrates policy and workflow injection that quickly implements stakeholder requirements, including features not contemplated in the original system design. PDD provides a path to quickly and cost effectively evolve such applications over a long lifetime.
Drug discovery chemistry: a primer for the non-specialist.

PubMed

Jordan, Allan M; Roughley, Stephen D

2009-08-01

Like all scientific disciplines, drug discovery chemistry is rife with terminology and methodology that can seem intractable to those outside the sphere of synthetic chemistry. Derived from a successful in-house workshop, this Foundation Review aims to demystify some of this inherent terminology, providing the non-specialist with a general insight into the nomenclature, terminology and workflow of medicinal chemists within the pharmaceutical industry.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Garzoglio, Gabriele

The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center propose a joint project. The goals are to enable scientific workflows of stakeholders to run on multiple cloud resources by use of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federat ion of Cloud Resources , and (c) High-Throughput Fabric Virtualization. This is a matching fund project in which Fermilab and KISTI will contribute equal resources .
Confidentiality Protection of User Data and Adaptive Resource Allocation for Managing Multiple Workflow Performance in Service-Based Systems

ERIC Educational Resources Information Center

An, Ho

2012-01-01

In this dissertation, two interrelated problems of service-based systems (SBS) are addressed: protecting users' data confidentiality from service providers, and managing performance of multiple workflows in SBS. Current SBSs pose serious limitations to protecting users' data confidentiality. Since users' sensitive data is sent in…
Changes in the cardiac rehabilitation workflow process needed for the implementation of a self-management system.

PubMed

Wiggers, Anne-Marieke; Vosbergen, Sandra; Kraaijenhagen, Roderik; Jaspers, Monique; Peek, Niels

2013-01-01

E-health interventions are of a growing importance for self-management of chronic conditions. This study aimed to describe the process adaptions that are needed in cardiac rehabilitation (CR) to implement a self-management system, called MyCARDSS. We created a generic workflow model based on interviews and observations at three CR clinics. Subsequently, a workflow model of the ideal situation after implementation of MyCARDSS was created. We found that the implementation will increase the complexity of existing working procedures because 1) not all patients will use MyCARDSS, 2) there is a transfer of tasks and responsibilities from professionals to patients, and 3) information in MyCARDSS needs to be synchronized with the EPR system for professionals.
Applied and implied semantics in crystallographic publishing

PubMed Central

2012-01-01

Background Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles. Results Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value. Conclusions The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines. PMID:22932420
BP-Broker use-cases in the UncertWeb framework

NASA Astrophysics Data System (ADS)

Roncella, Roberto; Bigagli, Lorenzo; Schulz, Michael; Stasch, Christoph; Proß, Benjamin; Jones, Richard; Santoro, Mattia

2013-04-01

The UncertWeb framework is a distributed, Web-based Information and Communication Technology (ICT) system to support scientific data modeling in presence of uncertainty. We designed and prototyped a core component of the UncertWeb framework: the Business Process Broker. The BP-Broker implements several functionalities, such as: discovery of available processes/BPs, preprocessing of a BP into its executable form (EBP), publication of EBPs and their execution through a workflow-engine. According to the Composition-as-a-Service (CaaS) approach, the BP-Broker supports discovery and chaining of modeling resources (and processing resources in general), providing the necessary interoperability services for creating, validating, editing, storing, publishing, and executing scientific workflows. The UncertWeb project targeted several scenarios, which were used to evaluate and test the BP-Broker. The scenarios cover the following environmental application domains: biodiversity and habitat change, land use and policy modeling, local air quality forecasting, and individual activity in the environment. This work reports on the study of a number of use-cases, by means of the BP-Broker, namely: - eHabitat use-case: implements a Monte Carlo simulation performed on a deterministic ecological model; an extended use-case supports inter-comparison of model outputs; - FERA use-case: is composed of a set of models for predicting land-use and crop yield response to climatic and economic change; - NILU use-case: is composed of a Probabilistic Air Quality Forecasting model for predicting concentrations of air pollutants; - Albatross use-case: includes two model services for simulating activity-travel patterns of individuals in time and space; - Overlay use-case: integrates the NILU scenario with the Albatross scenario to calculate the exposure to air pollutants of individuals. Our aim was to prove the feasibility of describing composite modeling processes with a high-level, abstract notation (i.e. BPMN 2.0), and delegating the resolution of technical issues (e.g. I/O matching) as much as possible to an external service. The results of the experimented solution indicate that this approach facilitates the integration of environmental model workflows into the standard geospatial Web Services framework (e.g. the GEOSS Common Infrastructure), mitigating its inherent complexity. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248488.
The Perfect Neuroimaging-Genetics-Computation Storm: Collision of Petabytes of Data, Millions of Hardware Devices and Thousands of Software Tools

PubMed Central

Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Zamanyan, Alen; Torri, Federica; Macciardi, Fabio; Hobel, Sam; Moon, Seok Woo; Sung, Young Hee; Jiang, Zhiguo; Labus, Jennifer; Kurth, Florian; Ashe-McNalley, Cody; Mayer, Emeran; Vespa, Paul M.; Van Horn, John D.; Toga, Arthur W.

2013-01-01

The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data. PMID:23975276

An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

PubMed

Mansouri, K; Grulke, C M; Richard, A M; Judson, R S; Williams, A J

2016-11-01

The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.
Separating Business Logic from Medical Knowledge in Digital Clinical Workflows Using Business Process Model and Notation and Arden Syntax.

PubMed

de Bruin, Jeroen S; Adlassnig, Klaus-Peter; Leitich, Harald; Rappelsberger, Andrea

2018-01-01

Evidence-based clinical guidelines have a major positive effect on the physician's decision-making process. Computer-executable clinical guidelines allow for automated guideline marshalling during a clinical diagnostic process, thus improving the decision-making process. Implementation of a digital clinical guideline for the prevention of mother-to-child transmission of hepatitis B as a computerized workflow, thereby separating business logic from medical knowledge and decision-making. We used the Business Process Model and Notation language system Activiti for business logic and workflow modeling. Medical decision-making was performed by an Arden-Syntax-based medical rule engine, which is part of the ARDENSUITE software. We succeeded in creating an electronic clinical workflow for the prevention of mother-to-child transmission of hepatitis B, where institution-specific medical decision-making processes could be adapted without modifying the workflow business logic. Separation of business logic and medical decision-making results in more easily reusable electronic clinical workflows.
Facebook for Scientists: Requirements and Services for Optimizing How Scientific Collaborations Are Established

PubMed Central

Spallek, Heiko; Butler, Brian S; Subramanian, Sushmita; Weiss, Daniel; Poythress, M Louisa; Rattanathikun, Phijarana; Mueller, Gregory

2008-01-01

Background As biomedical research projects become increasingly interdisciplinary and complex, collaboration with appropriate individuals, teams, and institutions becomes ever more crucial to project success. While social networks are extremely important in determining how scientific collaborations are formed, social networking technologies have not yet been studied as a tool to help form scientific collaborations. Many currently emerging expertise locating systems include social networking technologies, but it is unclear whether they make the process of finding collaborators more efficient and effective. Objective This study was conducted to answer the following questions: (1) Which requirements should systems for finding collaborators in biomedical science fulfill? and (2) Which information technology services can address these requirements? Methods The background research phase encompassed a thorough review of the literature, affinity diagramming, contextual inquiry, and semistructured interviews. This phase yielded five themes suggestive of requirements for systems to support the formation of collaborations. In the next phase, the generative phase, we brainstormed and selected design ideas for formal concept validation with end users. Then, three related, well-validated ideas were selected for implementation and evaluation in a prototype. Results Five main themes of systems requirements emerged: (1) beyond expertise, successful collaborations require compatibility with respect to personality, work style, productivity, and many other factors (compatibility); (2) finding appropriate collaborators requires the ability to effectively search in domains other than your own using information that is comprehensive and descriptive (communication); (3) social networks are important for finding potential collaborators, assessing their suitability and compatibility, and establishing contact with them (intermediation); (4) information profiles must be complete, correct, up-to-date, and comprehensive and allow fine-grained control over access to information by different audiences (information quality and access); (5) keeping online profiles up-to-date should require little or no effort and be integrated into the scientist’s existing workflow (motivation). Based on the requirements, 16 design ideas underwent formal validation with end users. Of those, three were chosen to be implemented and evaluated in a system prototype, “Digital|Vita”: maintaining, formatting, and semi-automated updating of biographical information; searching for experts; and building and maintaining the social network and managing document flow. Conclusions In addition to quantitative and factual information about potential collaborators, social connectedness, personal and professional compatibility, and power differentials also influence whether collaborations are formed. Current systems only partially model these requirements. Services in Digital|Vita combine an existing workflow, maintaining and formatting biographical information, with collaboration-searching functions in a novel way. Several barriers to the adoption of systems such as Digital|Vita exist, such as potential adoption asymmetries between junior and senior researchers and the tension between public and private information. Developers and researchers may consider one or more of the services described in this paper for implementation in their own expertise locating systems. PMID:18701421
Highlights of X-Stack ExM Deliverable: MosaStore

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ripeanu, Matei

2016-07-20

This brief report highlights the experience gained with MosaStore, an exploratory part of the X-Stack project “ExM: System support for extreme-scale, many-task applications”. The ExM project proposed to use concurrent workflows supported by the Swift language and runtime as an innovative programming model to exploit parallelism in exascale computers. MosaStore aims to support this endeavor by improving storage support for workflow-based applications, more precisely by exploring the gains that can be obtained from co-designing the storage system and the workflow runtime engine. MosaStore has been developed primarily at the University of British Columbia.
Real-Time Field Data Acquisition and Remote Sensor Reconfiguration Using Scientific Workflows

NASA Astrophysics Data System (ADS)

Silva, F.; Mehta, G.; Vahi, K.; Deelman, E.

2010-12-01

Despite many technological advances, field data acquisition still consists of several manual and laborious steps. Once sensors and data loggers are deployed in the field, scientists often have to periodically return to their study sites in order to collect their data. Even when field deployments have a way to communicate and transmit data back to the laboratory (e.g. by using a satellite or a cellular modem), data analysis still requires several repetitive steps. Because data often needs to be processed and inspected manually, there is usually a significant time delay between data collection and analysis. As a result, sensor failures that could be detected almost in real-time are not noted for weeks or months. Finally, sensor reconfiguration as a result of interesting events in the field is still done manually, making rapid response nearly impossible and causing important data to be missed. By working closely with scientists from different application domains, we identified several tasks that, if automated, could greatly improve the way field data is collected, processed, and distributed. Our goals are to enable real-time data collection and validation, automate sensor reconfiguration in response to interest events in the field, and allow scientists to easily automate their data processing. We began our design by employing the Sensor Processing and Acquisition Network (SPAN) architecture. SPAN uses an embedded processor in the field to coordinate sensor data acquisition from analog and digital sensors by interfacing with different types of devices and data loggers. SPAN is also able to interact with various types of communication devices in order to provide real-time communication to and from field sites. We use the Pegasus Workflow Management System (Pegasus WMS) to coordinate data collection and control sensors and deployments in the field. Because scientific workflows can be used to automate multi-step, repetitive tasks, scientists can create simple workflows to download sensor data, perform basic QA/QC, and identify events of interest as well as sensor and data logger failures almost in real-time. As a result of this automation, scientists can quickly be notified (e.g. via e-mail or SMS) so that important events are not missed. In addition, Pegasus WMS has the ability to abstract the execution environment of where programs run. By placing a Pegasus WMS agent inside an embedded processor in the field, we allow scientists to ship simple computational models to the field, enabling remote data processing at the field site. As an example, scientists can send an image processing algorithm to the field so that the embedded processor can analyze images, thus reducing the bandwidth necessary for communication. In addition, when real-time communication to the laboratory is not possible, scientists can create simple computational models that can be run on sensor nodes autonomously, monitoring sensor data and making adjustments without any human intervention. We believe our system lowers the bar for the adoption of reconfigurable sensor networks by field scientists. In this poster, we will show how this technology can be used to provide not only data acquisition, but also real-time data validation and sensor reconfiguration.
COSMOS: Python library for massively parallel workflows

PubMed Central

Gafni, Erik; Luquette, Lovelace J.; Lancaster, Alex K.; Hawkins, Jared B.; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter J.

2014-01-01

Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24982428
COSMOS: Python library for massively parallel workflows.

PubMed

Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J

2014-10-15

Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Web-based interactive visualization in a Grid-enabled neuroimaging application using HTML5.

PubMed

Siewert, René; Specovius, Svenja; Wu, Jie; Krefting, Dagmar

2012-01-01

Interactive visualization and correction of intermediate results are required in many medical image analysis pipelines. To allow certain interaction in the remote execution of compute- and data-intensive applications, new features of HTML5 are used. They allow for transparent integration of user interaction into Grid- or Cloud-enabled scientific workflows. Both 2D and 3D visualization and data manipulation can be performed through a scientific gateway without the need to install specific software or web browser plugins. The possibilities of web-based visualization are presented along the FreeSurfer-pipeline, a popular compute- and data-intensive software tool for quantitative neuroimaging.
Structuring clinical workflows for diabetes care: an overview of the OntoHealth approach.

PubMed

Schweitzer, M; Lasierra, N; Oberbichler, S; Toma, I; Fensel, A; Hoerbst, A

2014-01-01

Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view.
Structuring Clinical Workflows for Diabetes Care

PubMed Central

Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.

2014-01-01

Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE PAGES

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

2018-03-19

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
Towards an intelligent hospital environment: OR of the future.

PubMed

Sutherland, Jeffrey V; van den Heuvel, Willem-Jan; Ganous, Tim; Burton, Matthew M; Kumar, Animesh

2005-01-01

Patients, providers, payers, and government demand more effective and efficient healthcare services, and the healthcare industry needs innovative ways to re-invent core processes. Business process reengineering (BPR) showed adopting new hospital information systems can leverage this transformation and workflow management technologies can automate process management. Our research indicates workflow technologies in healthcare require real time patient monitoring, detection of adverse events, and adaptive responses to breakdown in normal processes. Adaptive workflow systems are rarely implemented making current workflow implementations inappropriate for healthcare. The advent of evidence based medicine, guideline based practice, and better understanding of cognitive workflow combined with novel technologies including Radio Frequency Identification (RFID), mobile/wireless technologies, internet workflow, intelligent agents, and Service Oriented Architectures (SOA) opens up new and exciting ways of automating business processes. Total situational awareness of events, timing, and location of healthcare activities can generate self-organizing change in behaviors of humans and machines. A test bed of a novel approach towards continuous process management was designed for the new Weinburg Surgery Building at the University of Maryland Medical. Early results based on clinical process mapping and analysis of patient flow bottlenecks demonstrated 100% improvement in delivery of supplies and instruments at surgery start time. This work has been directly applied to the design of the DARPA Trauma Pod research program where robotic surgery will be performed on wounded soldiers on the battlefield.
Planning bioinformatics workflows using an expert system.

PubMed

Chen, Xiaoling; Chang, Jeffrey T

2017-04-15

Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Planning bioinformatics workflows using an expert system

PubMed Central

Chen, Xiaoling; Chang, Jeffrey T.

2017-01-01

Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928
A web service for service composition to aid geospatial modelers

NASA Astrophysics Data System (ADS)

Bigagli, L.; Santoro, M.; Roncella, R.; Mazzetti, P.

2012-04-01

The identification of appropriate mechanisms for process reuse, chaining and composition is considered a key enabler for the effective uptake of a global Earth Observation infrastructure, currently pursued by the international geospatial research community. In the Earth and Space Sciences, such a facility could primarily enable integrated and interoperable modeling, for what several approaches have been proposed and developed, over the last years. In fact, GEOSS is specifically tasked with the development of the so-called "Model Web". At increasing levels of abstraction and generalization, the initial stove-pipe software tools have evolved to community-wide modeling frameworks, to Component-Based Architecture solution, and, more recently, started to embrace Service-Oriented Architectures technologies, such as the OGC WPS specification and the WS-* stack of W3C standards for service composition. However, so far, the level of abstraction seems too low for implementing the Model Web vision, and far too complex technological aspects must still be addressed by both providers and users, resulting in limited usability and, eventually, difficult uptake. As by the recent ICT trend of resource virtualization, it has been suggested that users in need of a particular processing capability, required by a given modeling workflow, may benefit from outsourcing the composition activities into an external first-class service, according to the Composition as a Service (CaaS) approach. A CaaS system provides the necessary interoperability service framework for adaptation, reuse and complementation of existing processing resources (including models and geospatial services in general) in the form of executable workflows. This work introduces the architecture of a CaaS system, as a distributed information system for creating, validating, editing, storing, publishing, and executing geospatial workflows. This way, the users can be freed from the need of a composition infrastructure and alleviated from the technicalities of workflow definitions (type matching, identification of external services endpoints, binding issues, etc.) and focus on their intended application. Moreover, the user may submit an incomplete workflow definition, and leverage CaaS recommendations (that may derive from an aggregated knowledge base of user feedback, underpinned by Web 2.0 technologies) to execute it. This is of particular interest for multidisciplinary scientific contexts, where different communities may benefit of each other knowledge through model chaining. Indeed, the CaaS approach is presented as an attempt to combine the recent advances in service-oriented computing with collaborative research principles, and social network information in general. Arguably, it may be considered a fundamental capability of the Model Web. The CaaS concept is being investigated in several application scenarios identified in the FP7 UncertWeb and EuroGEOSS projects. Key aspects of the described CaaS solution are: it provides a standard WPS interface for invoking Business Processes and allows on the fly recursive compositions of Business Processes into other Composite Processes; it is designed according to the extended SOA (broker-based) and the System-of-Systems approach, to support the reuse and integration of existing resources, in compliance with the GEOSS Model Web architecture. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248488.
A Community-Driven Workflow Recommendations and Reuse Infrastructure

NASA Astrophysics Data System (ADS)

Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

2013-12-01

Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from usage history to help Earth scientists better understand existing artifacts and how to use them in a proper manner? R2: Informed by insights derived from their computing contexts, how could such hidden knowledge be used to facilitate artifact reuse by Earth scientists? Our study of the two research questions will provide answers to three technical questions aiming to assist NEX users during workflow development: 1) How to determine what topics interest the researcher? 2) How to find appropriate artifacts? and 3) How to advise the researcher in artifact reuse? In this paper, we report our on-going efforts of leveraging social networking theory and analysis techniques to provide dynamic advice on artifact reuse to NEX users based on their surrounding contexts. As a proof of concept, we have designed and developed a plug-in to the VisTrails workflow design tool. When users develop workflows using VisTrails, our plug-in will proactively recommend most relevant sub-workflows to the users.
Identifying impact of software dependencies on replicability of biomedical workflows.

PubMed

Miksa, Tomasz; Rauber, Andreas; Mina, Eleni

2016-12-01

Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Automated quality control in a file-based broadcasting workflow

NASA Astrophysics Data System (ADS)

Zhang, Lina

2014-04-01

Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.
Using conceptual work products of health care to design health IT.

PubMed

Berry, Andrew B L; Butler, Keith A; Harrington, Craig; Braxton, Melissa O; Walker, Amy J; Pete, Nikki; Johnson, Trevor; Oberle, Mark W; Haselkorn, Jodie; Paul Nichol, W; Haselkorn, Mark

2016-02-01

This paper introduces a new, model-based design method for interactive health information technology (IT) systems. This method extends workflow models with models of conceptual work products. When the health care work being modeled is substantially cognitive, tacit, and complex in nature, graphical workflow models can become too complex to be useful to designers. Conceptual models complement and simplify workflows by providing an explicit specification for the information product they must produce. We illustrate how conceptual work products can be modeled using standard software modeling language, which allows them to provide fundamental requirements for what the workflow must accomplish and the information that a new system should provide. Developers can use these specifications to envision how health IT could enable an effective cognitive strategy as a workflow with precise information requirements. We illustrate the new method with a study conducted in an outpatient multiple sclerosis (MS) clinic. This study shows specifically how the different phases of the method can be carried out, how the method allows for iteration across phases, and how the method generated a health IT design for case management of MS that is efficient and easy to use. Copyright © 2015 Elsevier Inc. All rights reserved.

The equivalency between logic Petri workflow nets and workflow nets.

PubMed

Wang, Jing; Yu, ShuXia; Du, YuYue

2015-01-01

Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented.
The Equivalency between Logic Petri Workflow Nets and Workflow Nets

PubMed Central

Wang, Jing; Yu, ShuXia; Du, YuYue

2015-01-01

Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845
Workflow Management for Complex HEP Analyses

NASA Astrophysics Data System (ADS)

Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.

2017-10-01

We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.
Scalability of Several Asynchronous Many-Task Models for In Situ Statistical Analysis.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pebay, Philippe Pierre; Bennett, Janine Camille; Kolla, Hemanth

This report is a sequel to [PB16], in which we provided a first progress report on research and development towards a scalable, asynchronous many-task, in situ statistical analysis engine using the Legion runtime system. This earlier work included a prototype implementation of a proposed solution, using a proxy mini-application as a surrogate for a full-scale scientific simulation code. The first scalability studies were conducted with the above on modestly-sized experimental clusters. In contrast, in the current work we have integrated our in situ analysis engines with a full-size scientific application (S3D, using the Legion-SPMD model), and have conducted nu- mericalmore » tests on the largest computational platform currently available for DOE science ap- plications. We also provide details regarding the design and development of a light-weight asynchronous collectives library. We describe how this library is utilized within our SPMD- Legion S3D workflow, and compare the data aggregation technique deployed herein to the approach taken within our previous work.« less
RGB color calibration for quantitative image analysis: the "3D thin-plate spline" warping approach.

PubMed

Menesatti, Paolo; Angelini, Claudio; Pallottino, Federico; Antonucci, Francesca; Aguzzi, Jacopo; Costa, Corrado

2012-01-01

In the last years the need to numerically define color by its coordinates in n-dimensional space has increased strongly. Colorimetric calibration is fundamental in food processing and other biological disciplines to quantitatively compare samples' color during workflow with many devices. Several software programmes are available to perform standardized colorimetric procedures, but they are often too imprecise for scientific purposes. In this study, we applied the Thin-Plate Spline interpolation algorithm to calibrate colours in sRGB space (the corresponding Matlab code is reported in the Appendix). This was compared with other two approaches. The first is based on a commercial calibration system (ProfileMaker) and the second on a Partial Least Square analysis. Moreover, to explore device variability and resolution two different cameras were adopted and for each sensor, three consecutive pictures were acquired under four different light conditions. According to our results, the Thin-Plate Spline approach reported a very high efficiency of calibration allowing the possibility to create a revolution in the in-field applicative context of colour quantification not only in food sciences, but also in other biological disciplines. These results are of great importance for scientific color evaluation when lighting conditions are not controlled. Moreover, it allows the use of low cost instruments while still returning scientifically sound quantitative data.
Reducing Time to Science: Unidata and JupyterHub Technology Using the Jetstream Cloud

NASA Astrophysics Data System (ADS)

Chastang, J.; Signell, R. P.; Fischer, J. L.

2017-12-01

Cloud computing can accelerate scientific workflows, discovery, and collaborations by reducing research and data friction. We describe the deployment of Unidata and JupyterHub technologies on the NSF-funded XSEDE Jetstream cloud. With the aid of virtual machines and Docker technology, we deploy a Unidata JupyterHub server co-located with a Local Data Manager (LDM), THREDDS data server (TDS), and RAMADDA geoscience content management system. We provide Jupyter Notebooks and the pre-built Python environments needed to run them. The notebooks can be used for instruction and as templates for scientific experimentation and discovery. We also supply a large quantity of NCEP forecast model results to allow data-proximate analysis and visualization. In addition, users can transfer data using Globus command line tools, and perform their own data-proximate analysis and visualization with Notebook technology. These data can be shared with others via a dedicated TDS server for scientific distribution and collaboration. There are many benefits of this approach. Not only is the cloud computing environment fast, reliable and scalable, but scientists can analyze, visualize, and share data using only their web browser. No local specialized desktop software or a fast internet connection is required. This environment will enable scientists to spend less time managing their software and more time doing science.
Workflow-enabled distributed component-based information architecture for digital medical imaging enterprises.

PubMed

Wong, Stephen T C; Tjandra, Donny; Wang, Huili; Shen, Weimin

2003-09-01

Few information systems today offer a flexible means to define and manage the automated part of radiology processes, which provide clinical imaging services for the entire healthcare organization. Even fewer of them provide a coherent architecture that can easily cope with heterogeneity and inevitable local adaptation of applications and can integrate clinical and administrative information to aid better clinical, operational, and business decisions. We describe an innovative enterprise architecture of image information management systems to fill the needs. Such a system is based on the interplay of production workflow management, distributed object computing, Java and Web techniques, and in-depth domain knowledge in radiology operations. Our design adapts the approach of "4+1" architectural view. In this new architecture, PACS and RIS become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, the workflow-enabled digital radiology system would provide powerful query and statistical functions for managing resources and improving productivity. This paper will potentially lead to a new direction of image information management. We illustrate the innovative design with examples taken from an implemented system.
Using technology to improve and support communication and workflow processes.

PubMed

Bahlman, Deborah Tuke; Johnson, Fay C

2005-07-01

In conjunction with a large expansion project, a team of perioperative staff members reviewed their workflow processes and designed their ideal patient tracking and communication system. Technologies selected and deployed included a passive infrared tracking system, an enhanced nurse call system, wireless telephones, and a web-based electronic grease board. The new system provides staff members with an easy way to obtain critical pieces of patient information, as well as track the progress of patients and locate equipment.
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

DOE PAGES

Klimentov, A.; Buncic, P.; De, K.; ...

2015-05-22

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klimentov, A.; Buncic, P.; De, K.

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
A Scientific Data Provenance Harvester for Distributed Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stephan, Eric G.; Raju, Bibi; Elsethagen, Todd O.

Data provenance provides a way for scientists to observe how experimental data originates, conveys process history, and explains influential factors such as experimental rationale and associated environmental factors from system metrics measured at runtime. The US Department of Energy Office of Science Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project has developed a provenance harvester that is capable of collecting observations from file based evidence typically produced by distributed applications. To achieve this, file based evidence is extracted and transformed into an intermediate data format inspired in part by W3C CSV on the Web recommendations, calledmore » the Harvester Provenance Application Interface (HAPI) syntax. This syntax provides a general means to pre-stage provenance into messages that are both human readable and capable of being written to a provenance store, Provenance Environment (ProvEn). HAPI is being applied to harvest provenance from climate ensemble runs for Accelerated Climate Modeling for Energy (ACME) project funded under the U.S. Department of Energy’s Office of Biological and Environmental Research (BER) Earth System Modeling (ESM) program. ACME informally provides provenance in a native form through configuration files, directory structures, and log files that contain success/failure indicators, code traces, and performance measurements. Because of its generic format, HAPI is also being applied to harvest tabular job management provenance from Belle II DIRAC scheduler relational database tables as well as other scientific applications that log provenance related information.« less
The Pharmacogenomics Research Network Translational Pharmacogenetics Program: Outcomes and Metrics of Pharmacogenetic Implementations Across Diverse Healthcare Systems.

PubMed

Luzum, J A; Pakyz, R E; Elsey, A R; Haidar, C E; Peterson, J F; Whirl-Carrillo, M; Handelman, S K; Palmer, K; Pulley, J M; Beller, M; Schildcrout, J S; Field, J R; Weitzel, K W; Cooper-DeHoff, R M; Cavallari, L H; O'Donnell, P H; Altman, R B; Pereira, N; Ratain, M J; Roden, D M; Embi, P J; Sadee, W; Klein, T E; Johnson, J A; Relling, M V; Wang, L; Weinshilboum, R M; Shuldiner, A R; Freimuth, R R

2017-09-01

Numerous pharmacogenetic clinical guidelines and recommendations have been published, but barriers have hindered the clinical implementation of pharmacogenetics. The Translational Pharmacogenetics Program (TPP) of the National Institutes of Health (NIH) Pharmacogenomics Research Network was established in 2011 to catalog and contribute to the development of pharmacogenetic implementations at eight US healthcare systems, with the goal to disseminate real-world solutions for the barriers to clinical pharmacogenetic implementation. The TPP collected and normalized pharmacogenetic implementation metrics through June 2015, including gene-drug pairs implemented, interpretations of alleles and diplotypes, numbers of tests performed and actionable results, and workflow diagrams. TPP participant institutions developed diverse solutions to overcome many barriers, but the use of Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines provided some consistency among the institutions. The TPP also collected some pharmacogenetic implementation outcomes (scientific, educational, financial, and informatics), which may inform healthcare systems seeking to implement their own pharmacogenetic testing programs. © 2017, The American Society for Clinical Pharmacology and Therapeutics.
Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

NASA Astrophysics Data System (ADS)

Jones, A. S.; Horsburgh, J. S.

2014-12-01

Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The variety of environmental sensors and the multi-watershed, multi-institutional nature of the network necessitate a well-planned and efficient workflow for acquiring, managing, and sharing sensor data, which should be useful for similar large-scale and long-term networks.
Quantifying nursing workflow in medication administration.

PubMed

Keohane, Carol A; Bane, Anne D; Featherstone, Erica; Hayes, Judy; Woolf, Seth; Hurley, Ann; Bates, David W; Gandhi, Tejal K; Poon, Eric G

2008-01-01

New medication administration systems are showing promise in improving patient safety at the point of care, but adoption of these systems requires significant changes in nursing workflow. To prepare for these changes, the authors report on a time-motion study that measured the proportion of time that nurses spend on various patient care activities, focusing on medication administration-related activities. Implications of their findings are discussed.
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

PubMed

Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

2016-10-06

With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook

PubMed Central

Stevens, Jean-Luc R.; Elver, Marco; Bednar, James A.

2013-01-01

Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change. PMID:24416014
iCollections methodology: workflow, results and lessons learned

PubMed Central

Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J.; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Scialabba, Elisabetta; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J.

2017-01-01

Abstract The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project. PMID:29104442
iCollections methodology: workflow, results and lessons learned

PubMed Central

Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J.; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J.

2017-01-01

Abstract The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project. PMID:29104435
iCollections methodology: workflow, results and lessons learned.

PubMed

Blagoderov, Vladimir; Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J

2017-01-01

The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects - iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project.
How the provenance of electronic health record data matters for research: a case example using system mapping.

PubMed

Johnson, Karin E; Kamineni, Aruna; Fuller, Sharon; Olmstead, Danielle; Wernli, Karen J

2014-01-01

The use of electronic health records (EHRs) for research is proceeding rapidly, driven by computational power, analytical techniques, and policy. However, EHR-based research is limited by the complexity of EHR data and a lack of understanding about data provenance, meaning the context under which the data were collected. This paper presents system flow mapping as a method to help researchers more fully understand the provenance of their EHR data as it relates to local workflow. We provide two specific examples of how this method can improve data identification, documentation, and processing. EHRs store clinical and administrative data, often in unstructured fields. Each clinical system has a unique and dynamic workflow, as well as an EHR customized for local use. The EHR customization may be influenced by a broader context such as documentation required for billing. We present a case study with two examples of using system flow mapping to characterize EHR data for a local colorectal cancer screening process. System flow mapping demonstrated that information entered into the EHR during clinical practice required interpretation and transformation before it could be accurately applied to research. We illustrate how system flow mapping shaped our knowledge of the quality and completeness of data in two examples: (1) determining colonoscopy indication as recorded in the EHR, and (2) discovering a specific EHR form that captured family history. Researchers who do not consider data provenance risk compiling data that are systematically incomplete or incorrect. For example, researchers who are not familiar with the clinical workflow under which data were entered might miss or misunderstand patient information or procedure and diagnostic codes. Data provenance is a fundamental characteristic of research data from EHRs. Given the diversity of EHR platforms and system workflows, researchers need tools for evaluating and reporting data availability, quality, and transformations. Our case study illustrates how system mapping can inform researchers about the provenance of their data as it pertains to local workflows.

Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks

PubMed Central

Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G.; Hogg, David C.; Stricker, Didier

2015-01-01

Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a combined entity as well as point out limitations of the system. PMID:26126116
Workflow in clinical trial sites & its association with near miss events for data quality: ethnographic, workflow & systems simulation.

PubMed

de Carvalho, Elias Cesar Araujo; Batilana, Adelia Portero; Claudino, Wederson; Reis, Luiz Fernando Lima; Schmerling, Rafael A; Shah, Jatin; Pietrobon, Ricardo

2012-01-01

With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage professionals to reduce near miss events and save time/cost. Clinical trial sponsors should improve relevant support systems.
Workflow in Clinical Trial Sites & Its Association with Near Miss Events for Data Quality: Ethnographic, Workflow & Systems Simulation

PubMed Central

Araujo de Carvalho, Elias Cesar; Batilana, Adelia Portero; Claudino, Wederson; Lima Reis, Luiz Fernando; Schmerling, Rafael A.; Shah, Jatin; Pietrobon, Ricardo

2012-01-01

Background With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Methodology/Principal Findings Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Conclusions/Significance Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage professionals to reduce near miss events and save time/cost. Clinical trial sponsors should improve relevant support systems. PMID:22768105
Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

PubMed Central

Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495
Using workflows to explore and optimise named entity recognition for chemistry.

PubMed

Kolluru, Balakrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.
ITK: enabling reproducible research and open science

PubMed Central

McCormick, Matthew; Liu, Xiaoxiao; Jomier, Julien; Marion, Charles; Ibanez, Luis

2014-01-01

Reproducibility verification is essential to the practice of the scientific method. Researchers report their findings, which are strengthened as other independent groups in the scientific community share similar outcomes. In the many scientific fields where software has become a fundamental tool for capturing and analyzing data, this requirement of reproducibility implies that reliable and comprehensive software platforms and tools should be made available to the scientific community. The tools will empower them and the public to verify, through practice, the reproducibility of observations that are reported in the scientific literature. Medical image analysis is one of the fields in which the use of computational resources, both software and hardware, are an essential platform for performing experimental work. In this arena, the introduction of the Insight Toolkit (ITK) in 1999 has transformed the field and facilitates its progress by accelerating the rate at which algorithmic implementations are developed, tested, disseminated and improved. By building on the efficiency and quality of open source methodologies, ITK has provided the medical image community with an effective platform on which to build a daily workflow that incorporates the true scientific practices of reproducibility verification. This article describes the multiple tools, methodologies, and practices that the ITK community has adopted, refined, and followed during the past decade, in order to become one of the research communities with the most modern reproducibility verification infrastructure. For example, 207 contributors have created over 2400 unit tests that provide over 84% code line test coverage. The Insight Journal, an open publication journal associated with the toolkit, has seen over 360,000 publication downloads. The median normalized closeness centrality, a measure of knowledge flow, resulting from the distributed peer code review system was high, 0.46. PMID:24600387
ITK: enabling reproducible research and open science.

PubMed

McCormick, Matthew; Liu, Xiaoxiao; Jomier, Julien; Marion, Charles; Ibanez, Luis

2014-01-01

Reproducibility verification is essential to the practice of the scientific method. Researchers report their findings, which are strengthened as other independent groups in the scientific community share similar outcomes. In the many scientific fields where software has become a fundamental tool for capturing and analyzing data, this requirement of reproducibility implies that reliable and comprehensive software platforms and tools should be made available to the scientific community. The tools will empower them and the public to verify, through practice, the reproducibility of observations that are reported in the scientific literature. Medical image analysis is one of the fields in which the use of computational resources, both software and hardware, are an essential platform for performing experimental work. In this arena, the introduction of the Insight Toolkit (ITK) in 1999 has transformed the field and facilitates its progress by accelerating the rate at which algorithmic implementations are developed, tested, disseminated and improved. By building on the efficiency and quality of open source methodologies, ITK has provided the medical image community with an effective platform on which to build a daily workflow that incorporates the true scientific practices of reproducibility verification. This article describes the multiple tools, methodologies, and practices that the ITK community has adopted, refined, and followed during the past decade, in order to become one of the research communities with the most modern reproducibility verification infrastructure. For example, 207 contributors have created over 2400 unit tests that provide over 84% code line test coverage. The Insight Journal, an open publication journal associated with the toolkit, has seen over 360,000 publication downloads. The median normalized closeness centrality, a measure of knowledge flow, resulting from the distributed peer code review system was high, 0.46.
Medical image computing for computer-supported diagnostics and therapy. Advances and perspectives.

PubMed

Handels, H; Ehrhardt, J

2009-01-01

Medical image computing has become one of the most challenging fields in medical informatics. In image-based diagnostics of the future software assistance will become more and more important, and image analysis systems integrating advanced image computing methods are needed to extract quantitative image parameters to characterize the state and changes of image structures of interest (e.g. tumors, organs, vessels, bones etc.) in a reproducible and objective way. Furthermore, in the field of software-assisted and navigated surgery medical image computing methods play a key role and have opened up new perspectives for patient treatment. However, further developments are needed to increase the grade of automation, accuracy, reproducibility and robustness. Moreover, the systems developed have to be integrated into the clinical workflow. For the development of advanced image computing systems methods of different scientific fields have to be adapted and used in combination. The principal methodologies in medical image computing are the following: image segmentation, image registration, image analysis for quantification and computer assisted image interpretation, modeling and simulation as well as visualization and virtual reality. Especially, model-based image computing techniques open up new perspectives for prediction of organ changes and risk analysis of patients and will gain importance in diagnostic and therapy of the future. From a methodical point of view the authors identify the following future trends and perspectives in medical image computing: development of optimized application-specific systems and integration into the clinical workflow, enhanced computational models for image analysis and virtual reality training systems, integration of different image computing methods, further integration of multimodal image data and biosignals and advanced methods for 4D medical image computing. The development of image analysis systems for diagnostic support or operation planning is a complex interdisciplinary process. Image computing methods enable new insights into the patient's image data and have the future potential to improve medical diagnostics and patient treatment.
Using Technology, Clinical Workflow Redesign, and Team Solutions to Achieve the Patient Centered Medical Home

DTIC Science & Technology

2011-01-01

The Quadruple Aim: Working Together, Achieving Success 2011 Military Health System Conference TMA and Services Using Technology, Clinical Workflow...Redesign, and Team Solutions to Achieve the Patient Centered Medical Home LTC Nicole Kerkenbush, MHA, MN Army Medical Department, Office of the...Surgeon General Chief Medical Information Officer 1 Military Health System Conference Report Documentation Page Form ApprovedOMB No. 0704-0188 Public
An access control model with high security for distributed workflow and real-time application

NASA Astrophysics Data System (ADS)

Han, Ruo-Fei; Wang, Hou-Xiang

2007-11-01

The traditional mandatory access control policy (MAC) is regarded as a policy with strict regulation and poor flexibility. The security policy of MAC is so compelling that few information systems would adopt it at the cost of facility, except some particular cases with high security requirement as military or government application. However, with the increasing requirement for flexibility, even some access control systems in military application have switched to role-based access control (RBAC) which is well known as flexible. Though RBAC can meet the demands for flexibility but it is weak in dynamic authorization and consequently can not fit well in the workflow management systems. The task-role-based access control (T-RBAC) is then introduced to solve the problem. It combines both the advantages of RBAC and task-based access control (TBAC) which uses task to manage permissions dynamically. To satisfy the requirement of system which is distributed, well defined with workflow process and critically for time accuracy, this paper will analyze the spirit of MAC, introduce it into the improved T&RBAC model which is based on T-RBAC. At last, a conceptual task-role-based access control model with high security for distributed workflow and real-time application (A_T&RBAC) is built, and its performance is simply analyzed.
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

NASA Astrophysics Data System (ADS)

Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

2012-12-01

The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
The Globus Galaxies Platform. Delivering Science Gateways as a Service

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Ravi; Chard, Kyle; Chard, Ryan

We use public cloud computers to host sophisticated scientific data; software is then used to transform scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud-based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain-independent, cloud-based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gatewaymore » developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud-based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have also implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost-aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.« less
Imaging workflow and calibration for CT-guided time-domain fluorescence tomography

PubMed Central

Tichauer, Kenneth M.; Holt, Robert W.; El-Ghussein, Fadi; Zhu, Qun; Dehghani, Hamid; Leblond, Frederic; Pogue, Brian W.

2011-01-01

In this study, several key optimization steps are outlined for a non-contact, time-correlated single photon counting small animal optical tomography system, using simultaneous collection of both fluorescence and transmittance data. The system is presented for time-domain image reconstruction in vivo, illustrating the sensitivity from single photon counting and the calibration steps needed to accurately process the data. In particular, laser time- and amplitude-referencing, detector and filter calibrations, and collection of a suitable instrument response function are all presented in the context of time-domain fluorescence tomography and a fully automated workflow is described. Preliminary phantom time-domain reconstructed images demonstrate the fidelity of the workflow for fluorescence tomography based on signal from multiple time gates. PMID:22076264
Data Management and Archiving - a Long Process

NASA Astrophysics Data System (ADS)

Gebauer, Petra; Bertelmann, Roland; Hasler, Tim; Kirchner, Ingo; Klump, Jens; Mettig, Nora; Peters-Kottig, Wolfgang; Rusch, Beate; Ulbricht, Damian

2014-05-01

Implementing policies for research data management to the end of data archiving at university institutions takes a long time. Even though, especially in geosciences, most of the scientists are familiar to analyze different sorts of data, to present statistical results and to write publications sometimes based on big data records, only some of them manage their data in a standardized manner. Much more often they have learned how to measure and to generate large volumes of data than to document these measurements and to preserve them for the future. Changing staff and limited funding make this work more difficult, but it is essential in a progressively developing digital and networked world. Results from the project EWIG (Translates to: Developing workflow components for long-term archiving of research data in geosciences), funded by Deutsche Forschungsgemeinschaft, will help on these theme. Together with the project partners Deutsches GeoForschungsZentrum Potsdam and Konrad-Zuse-Zentrum für Informationstechnik Berlin a workflow to transfer continuously recorded data from a meteorological city monitoring network into a long-term archive was developed. This workflow includes quality assurance of the data as well as description of metadata and using tools to prepare data packages for long term archiving. It will be an exemplary model for other institutions working with similar data. The development of this workflow is closely intertwined with the educational curriculum at the Institut für Meteorologie. Designing modules to run quality checks for meteorological time series of data measured every minute and preparing metadata are tasks in actual bachelor theses. Students will also test the usability of the generated working environment. Based on these experiences a practical guideline for integrating research data management in curricula will be one of the results of this project, for postgraduates as well as for younger students. Especially at the beginning of the scientific career it is necessary to become familiar with all issues concerning data management. The outcomes of EWIG are intended to be generic enough to be easily adopted by other institutions. University lectures in meteorology were started to teach future scientific generations right from the start how to deal with all sorts of different data in a transparent way. The progress of the project EWIG can be followed on the web via ewig.gfz-potsdam.de
A pattern-based analysis of clinical computer-interpretable guideline modeling languages.

PubMed

Mulyar, Nataliya; van der Aalst, Wil M P; Peleg, Mor

2007-01-01

Languages used to specify computer-interpretable guidelines (CIGs) differ in their approaches to addressing particular modeling challenges. The main goals of this article are: (1) to examine the expressive power of CIG modeling languages, and (2) to define the differences, from the control-flow perspective, between process languages in workflow management systems and modeling languages used to design clinical guidelines. The pattern-based analysis was applied to guideline modeling languages Asbru, EON, GLIF, and PROforma. We focused on control-flow and left other perspectives out of consideration. We evaluated the selected CIG modeling languages and identified their degree of support of 43 control-flow patterns. We used a set of explicitly defined evaluation criteria to determine whether each pattern is supported directly, indirectly, or not at all. PROforma offers direct support for 22 of 43 patterns, Asbru 20, GLIF 17, and EON 11. All four directly support basic control-flow patterns, cancellation patterns, and some advance branching and synchronization patterns. None support multiple instances patterns. They offer varying levels of support for synchronizing merge patterns and state-based patterns. Some support a few scenarios not covered by the 43 control-flow patterns. CIG modeling languages are remarkably close to traditional workflow languages from the control-flow perspective, but cover many fewer workflow patterns. CIG languages offer some flexibility that supports modeling of complex decisions and provide ways for modeling some decisions not covered by workflow management systems. Workflow management systems may be suitable for clinical guideline applications.
Hydrography for the non-Hydrographer: A Paradigm shift in Data Processing

NASA Astrophysics Data System (ADS)

Malzone, C.; Bruce, S.

2017-12-01

Advancements in technology have led to overall systematic improvements including; hardware design, software architecture, data transmission/ telepresence. Historically, utilization of this technology has required a high knowledge level obtained with many years of experience, training and/or education. High training costs are incurred to achieve and maintain an acceptable level proficiency within an organization. Recently, engineers have developed off-the-shelf software technology called Qimera that has simplified the processing of hydrographic data. The core technology is centered around the isolation of tasks within the work- flow to capitalize on the technological advances in computing technology to automate the mundane error prone tasks to bring more value to the stages in which the human brain brings value. Key design features include: guided workflow, transcription automation, processing state management, real-time QA, dynamic workflow for validation, collaborative cleaning and production line processing. Since, Qimera is designed to guide the user, it allows expedition leaders to focus on science while providing an educational opportunity for students to quickly learn the hydrographic processing workflow including ancillary data analysis, trouble-shooting, calibration and cleaning. This paper provides case studies on how Qimera is currently implemented in scientific expeditions, benefits of implementation and how it is directing the future of on-board research for the non-hydrographer.
A high throughput geocomputing system for remote sensing quantitative retrieval and a case study

NASA Astrophysics Data System (ADS)

Xue, Yong; Chen, Ziqiang; Xu, Hui; Ai, Jianwen; Jiang, Shuzheng; Li, Yingjie; Wang, Ying; Guang, Jie; Mei, Linlu; Jiao, Xijuan; He, Xingwei; Hou, Tingting

2011-12-01

The quality and accuracy of remote sensing instruments have been improved significantly, however, rapid processing of large-scale remote sensing data becomes the bottleneck for remote sensing quantitative retrieval applications. The remote sensing quantitative retrieval is a data-intensive computation application, which is one of the research issues of high throughput computation. The remote sensing quantitative retrieval Grid workflow is a high-level core component of remote sensing Grid, which is used to support the modeling, reconstruction and implementation of large-scale complex applications of remote sensing science. In this paper, we intend to study middleware components of the remote sensing Grid - the dynamic Grid workflow based on the remote sensing quantitative retrieval application on Grid platform. We designed a novel architecture for the remote sensing Grid workflow. According to this architecture, we constructed the Remote Sensing Information Service Grid Node (RSSN) with Condor. We developed a graphic user interface (GUI) tools to compose remote sensing processing Grid workflows, and took the aerosol optical depth (AOD) retrieval as an example. The case study showed that significant improvement in the system performance could be achieved with this implementation. The results also give a perspective on the potential of applying Grid workflow practices to remote sensing quantitative retrieval problems using commodity class PCs.
Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

PubMed

Hatakeyama, Masaomi; Aluri, Sirisha; Balachadran, Mathi Thumilan; Sivarajan, Sajeevan Radha; Patrignani, Andrea; Grüter, Simon; Poveda, Lucy; Shimizu-Inatsugi, Rie; Baeten, John; Francoijs, Kees-Jan; Nataraja, Karaba N; Reddy, Yellodu A Nanja; Phadnis, Shamprasad; Ravikumar, Ramapura L; Schlapbach, Ralph; Sreeman, Sheshshayee M; Shimizu, Kentaro K

2017-09-05

Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length >2.6 Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
About a method for compressing x-ray computed microtomography data

NASA Astrophysics Data System (ADS)

Mancini, Lucia; Kourousias, George; Billè, Fulvio; De Carlo, Francesco; Fidler, Aleš

2018-04-01

The management of scientific data is of high importance especially for experimental techniques that produce big data volumes. Such a technique is x-ray computed tomography (CT) and its community has introduced advanced data formats which allow for better management of experimental data. Rather than the organization of the data and the associated meta-data, the main topic on this work is data compression and its applicability to experimental data collected from a synchrotron-based CT beamline at the Elettra-Sincrotrone Trieste facility (Italy) and studies images acquired from various types of samples. This study covers parallel beam geometry, but it could be easily extended to a cone-beam one. The reconstruction workflow used is the one currently in operation at the beamline. Contrary to standard image compression studies, this manuscript proposes a systematic framework and workflow for the critical examination of different compression techniques and does so by applying it to experimental data. Beyond the methodology framework, this study presents and examines the use of JPEG-XR in combination with HDF5 and TIFF formats providing insights and strategies on data compression and image quality issues that can be used and implemented at other synchrotron facilities and laboratory systems. In conclusion, projection data compression using JPEG-XR appears as a promising, efficient method to reduce data file size and thus to facilitate data handling and image reconstruction.
Reproducible Research in the Geosciences at Scale: Achievable Goal or Elusive Dream?

NASA Astrophysics Data System (ADS)

Wyborn, L. A.; Evans, B. J. K.

2016-12-01

Reproducibility is a fundamental tenant of the scientific method: it implies that any researcher, or a third party working independently, can duplicate any experiment or investigation and produce the same results. Historically computationally based research involved an individual using their own data and processing it in their own private area, often using software they wrote or inherited from close collaborators. Today, a researcher is likely to be part of a large team that will use a subset of data from an external repository and then process the data on a public or private cloud or on a large centralised supercomputer, using a mixture of their own code, third party software and libraries, or global community codes. In 'Big Geoscience' research it is common for data inputs to be extracts from externally managed dynamic data collections, where new data is being regularly appended, or existing data is revised when errors are detected and/or as processing methods are improved. New workflows increasingly use services to access data dynamically to create subsets on-the-fly from distributed sources, each of which can have a complex history. At major computational facilities, underlying systems, libraries, software and services are being constantly tuned and optimised, or as new or replacement infrastructure being installed. Likewise code used from a community repository is continually being refined, re-packaged and ported to the target platform. To achieve reproducibility, today's researcher increasingly needs to track their workflow, including querying information on the current or historical state of facilities used. Versioning methods are standard practice for software repositories or packages, but it is not common for either data repositories or data services to provide information about their state, or for systems to provide query-able access to changes in the underlying software. While a researcher can achieve transparency and describe steps in their workflow so that others can repeat them and replicate processes undertaken, they cannot achieve exact reproducibility or even transparency of results generated. In Big Geoscience, full reproducibiliy will be an elusive dream until data repositories and compute facilities can provide provenance information in a standards compliant, machine query-able way.

OntoMate: a text-mining tool aiding curation at the Rat Genome Database

PubMed Central

Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

2015-01-01

The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
Virtual Geophysics Laboratory: Exploiting the Cloud and Empowering Geophysicsts

NASA Astrophysics Data System (ADS)

Fraser, Ryan; Vote, Josh; Goh, Richard; Cox, Simon

2013-04-01

Over the last five decades geoscientists from Australian state and federal agencies have collected and assembled around 3 Petabytes of geoscience data sets under public funding. As a consequence of technological progress, data is now being acquired at exponential rates and in higher resolution than ever before. Effective use of these big data sets challenges the storage and computational infrastructure of most organizations. The Virtual Geophysics Laboratory (VGL) is a scientific workflow portal addresses some of the resulting issues by providing Australian geophysicists with access to a Web 2.0 or Rich Internet Application (RIA) based integrated environment that exploits eResearch tools and Cloud computing technology, and promotes collaboration between the user community. VGL simplifies and automates large portions of what were previously manually intensive scientific workflow processes, allowing scientists to focus on the natural science problems, rather than computer science and IT. A number of geophysical processing codes are incorporated to support multiple workflows. For example a gravity inversion can be performed by combining the Escript/Finley codes (from the University of Queensland) with the gravity data registered in VGL. Likewise, tectonic processes can also be modeled by combining the Underworld code (from Monash University) with one of the various 3D models available to VGL. Cloud services provide scalable and cost effective compute resources. VGL is built on top of mature standards-compliant information services, many deployed using the Spatial Information Services Stack (SISS), which provides direct access to geophysical data. A large number of data sets from Geoscience Australia assist users in data discovery. GeoNetwork provides a metadata catalog to store workflow results for future use, discovery and provenance tracking. VGL has been developed in collaboration with the research community using incremental software development practices and open source tools. While developed to provide the geophysics research community with a sustainable platform and scalable infrastructure; VGL has also developed a number of concepts, patterns and generic components of which have been reused for cases beyond geophysics, including natural hazards, satellite processing and other areas requiring spatial data discovery and processing. Future plans for VGL include a number of improvements in both functional and non-functional areas in response to its user community needs and advancement in information technologies. In particular, research is underway in the following areas (a) distributed and parallel workflow processing in the cloud, (b) seamless integration with various cloud providers, and (c) integration with virtual laboratories representing other science domains. Acknowledgements: VGL was developed by CSIRO in collaboration with Geoscience Australia, National Computational Infrastructure, Australia National University, Monash University and University of Queensland, and has been supported by the Australian Government's Education Investment Funds through NeCTAR.
Content and Workflow Management for Library Websites: Case Studies

ERIC Educational Resources Information Center

Yu, Holly, Ed.

2005-01-01

Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…
A workflow learning model to improve geovisual analytics utility

PubMed Central

Roth, Robert E; MacEachren, Alan M; McCabe, Craig A

2011-01-01

Introduction This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. Objectives The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. Methodology The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545
A workflow learning model to improve geovisual analytics utility.

PubMed

Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

2009-01-01

INTRODUCTION: This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. OBJECTIVES: The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. METHODOLOGY: The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
Using R for large spatiotemporal data sets

NASA Astrophysics Data System (ADS)

Pebesma, Edzer

2017-04-01

Writing and sharing scientific software is a means to communicate scientific ideas for finding scientific consensus, no more and no less than writing and sharing scientific papers is. Important factors for successful communication are adopting an open source environment, and using a language that is understood by many. For many scientist, R's combination of rich data abstraction and highly exposed data structures makes it an attractive communication tool. This paper discusses the development of spatial and spatiotemporal data handling and analysis with R since 2000, and will point to some of R's strengths and weaknesses in a historical perspective. We will also discuss a new, S3-based package for feature data ("Simple Features for R"), and point to a way forward into the data science realm, where pipeline-based workflows are assumed. Finally, we will discuss how, in a similar vein, massive satellite or climate model data sets, potentially held in a cloud environment, can be handled and analyzed with R.
Integrated workflows for spiking neuronal network simulations

PubMed Central

Antolík, Ján; Davison, Andrew P.

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

PubMed

Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

2017-12-01

Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.
Integrated workflows for spiking neuronal network simulations.

PubMed

Antolík, Ján; Davison, Andrew P

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Active Provenance in Data-intensive Research

NASA Astrophysics Data System (ADS)

Spinuso, Alessandro; Mihajlovski, Andrej; Filgueira, Rosa; Atkinson, Malcolm

2017-04-01

Scientific communities are building platforms where the usage of data-intensive workflows is crucial to conduct their research campaigns. However managing and effectively support the understanding of the 'live' processes, fostering computational steering, sharing and re-use of data and methods, present several bottlenecks. These are often caused by the poor level of documentation on the methods and the data and how users interact with it. This work wants to explore how in such systems, flexibility in the management of the provenance and its adaptation to the different users and application contexts can lead to new opportunities for its exploitation, improving productivity. In particular, this work illustrates a conceptual and technical framework enabling tunable and actionable provenance in data-intensive workflow systems in support of reproducible science. It introduces the concept of Agile data-intensive systems to define the characteristic of our target platform. It shows a novel approach to the integration of provenance mechanisms, offering flexibility in the scale and in the precision of the provenance data collected, ensuring its relevance to the domain of the data-intensive task, fostering its rapid exploitation. The contributions address aspects of the scale of the provenance records, their usability and active role in the research life-cycle. We will discuss the use of dynamically generated provenance types as the approach for the integration of provenance mechanisms into a data-intensive workflow system. Enabling provenance can be transparent to the workflow user and developer, as well as fully controllable and customisable, depending from their expertise and the application's reproducibility, monitoring and validation requirements. The API that allows the realisation and adoption of a provenance type is presented, especially for what concerns the support of provenance profiling, contextualisation and precision. An actionable approach to provenance management will be also discussed, enabling provenance-driven operations at runtime, regardless of the enactment technologies and connectivity impediments. We proposes a framework based on concepts such as provenance clusters and provenance sensors, envisaging new potential for exploiting large quantities of provenance traces at runtime. Finally the work will also introduce how the underlying provenance model can be explored with big-data visualization techniques, aiming at producing comprehensive and interactive views on top of large and heterogeneous provenance data. We will demonstrate the adoption of alternative visualisation methods, from detailed and localised interactive graphs to radial-views, serving different purposes and expertise. Combining provenance types, selective rules, extensible metadata with reactive clustering opens a new and more versatile role of the lineage information in the research life-cycle, thanks to its improved usability. The flexible profiling of the proposed framework offers aid to the human analysis of the process, with the support of advanced and intuitive interactive graphical tools. The Active provenance methods are discussed in the context of a real implementation for a data-intensive library (dispel4py) and its adoption within use cases for computational seismology, climate studies and generic correlation analysis.
Enabling Real-time Water Decision Support Services Using Model as a Service

NASA Astrophysics Data System (ADS)

Zhao, T.; Minsker, B. S.; Lee, J. S.; Salas, F. R.; Maidment, D. R.; David, C. H.

2014-12-01

Through application of computational methods and an integrated information system, data and river modeling services can help researchers and decision makers more rapidly understand river conditions under alternative scenarios. To enable this capability, workflows (i.e., analysis and model steps) are created and published as Web services delivered through an internet browser, including model inputs, a published workflow service, and visualized outputs. The RAPID model, which is a river routing model developed at University of Texas Austin for parallel computation of river discharge, has been implemented as a workflow and published as a Web application. This allows non-technical users to remotely execute the model and visualize results as a service through a simple Web interface. The model service and Web application has been prototyped in the San Antonio and Guadalupe River Basin in Texas, with input from university and agency partners. In the future, optimization model workflows will be developed to link with the RAPID model workflow to provide real-time water allocation decision support services.
Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow

DOE PAGES

Brunk, Elizabeth; George, Kevin W.; Alonso-Gutierrez, Jorge; ...

2016-05-19

Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proofmore » of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.« less
Camera-augmented mobile C-arm (CamC): A feasibility study of augmented reality imaging in the operating room.

PubMed

von der Heide, Anna Maria; Fallavollita, Pascal; Wang, Lejing; Sandner, Philipp; Navab, Nassir; Weidert, Simon; Euler, Ekkehard

2018-04-01

In orthopaedic trauma surgery, image-guided procedures are mostly based on fluoroscopy. The reduction of radiation exposure is an important goal. The purpose of this work was to investigate the impact of a camera-augmented mobile C-arm (CamC) on radiation exposure and the surgical workflow during a first clinical trial. Applying a workflow-oriented approach, 10 general workflow steps were defined to compare the CamC to traditional C-arms. The surgeries included were arbitrarily identified and assigned to the study. The evaluation criteria were radiation exposure and operation time for each workflow step and the entire surgery. The evaluation protocol was designed and conducted in a single-centre study. The radiation exposure was remarkably reduced by 18 X-ray shots 46% using the CamC while keeping similar surgery times. The intuitiveness of the system, its easy integration into the surgical workflow, and its great potential to reduce radiation have been demonstrated. Copyright © 2017 John Wiley & Sons, Ltd.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

NASA Astrophysics Data System (ADS)

Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

2012-09-01

By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
An XML Representation for Crew Procedures

NASA Technical Reports Server (NTRS)

Simpson, Richard C.

2005-01-01

NASA ensures safe operation of complex systems through the use of formally-documented procedures, which encode the operational knowledge of the system as derived from system experts. Crew members use procedure documentation on the ground for training purposes and on-board space shuttle and space station to guide their activities. Investigators at JSC are developing a new representation for procedures that is content-based (as opposed to display-based). Instead of specifying how a procedure should look on the printed page, the content-based representation will identify the components of a procedure and (more importantly) how the components are related (e.g., how the activities within a procedure are sequenced; what resources need to be available for each activity). This approach will allow different sets of rules to be created for displaying procedures on a computer screen, on a hand-held personal digital assistant (PDA), verbally, or on a printed page, and will also allow intelligent reasoning processes to automatically interpret and use procedure definitions. During his NASA fellowship, Dr. Simpson examined how various industries represent procedures (also called business processes or workflows), in areas such as manufacturing, accounting, shipping, or customer service. A useful method for designing and evaluating workflow representation languages is by determining their ability to encode various workflow patterns, which depict abstract relationships between the components of a procedure removed from the context of a specific procedure or industry. Investigators have used this type of analysis to evaluate how well-suited existing workflow representation languages are for various industries based on the workflow patterns that commonly arise across industry-specific procedures. Based on this type of analysis, it is already clear that existing workflow representations capture discrete flow of control (i.e., when one activity should start and stop based on when other activities start and stop), but do not capture the flow of data, materials, resources or priorities. Existing workflow representation languages are also limited to representing sequences of discrete activities, and cannot encode procedures involving continuous flow of information or materials between activities.
Documenting Models for Interoperability and Reusability ...

EPA Pesticide Factsheets

Many modeling frameworks compartmentalize science via individual models that link sets of small components to create larger modeling workflows. Developing integrated watershed models increasingly requires coupling multidisciplinary, independent models, as well as collaboration between scientific communities, since component-based modeling can integrate models from different disciplines. Integrated Environmental Modeling (IEM) systems focus on transferring information between components by capturing a conceptual site model; establishing local metadata standards for input/output of models and databases; managing data flow between models and throughout the system; facilitating quality control of data exchanges (e.g., checking units, unit conversions, transfers between software languages); warning and error handling; and coordinating sensitivity/uncertainty analyses. Although many computational software systems facilitate communication between, and execution of, components, there are no common approaches, protocols, or standards for turn-key linkages between software systems and models, especially if modifying components is not the intent. Using a standard ontology, this paper reviews how models can be described for discovery, understanding, evaluation, access, and implementation to facilitate interoperability and reusability. In the proceedings of the International Environmental Modelling and Software Society (iEMSs), 8th International Congress on Environmental Mod
Internet's impact on publishing

NASA Astrophysics Data System (ADS)

Beretta, Giordano B.

1997-04-01

In 1990, the first monochrome print-on-demand (POD) systems wee successfully brought to market. Subsequent color versions have been less successful, in my view mostly because they require a different workflow than traditional systems and the highly skilled specialists have not been trained. This hypothesis is based on the observation that direct-to-plate systems for short run printing, which do not require a new workflow, are quite successful in the market place. The internet and the World Wide Web are the enabling technologies that are fostering a new print model that is very likely to replace color POD before the latter can establish itself. In this model the consumers locate the material they desire from a contents provider, pay through a digital cash clearinghouse, and print the material at their own cost on their local printer. All the basic technologies for this model are in place; the main challenge is to make the workflow sufficiently robust for individual use.
Medical Devices Transition to Information Systems: Lessons Learned

PubMed Central

Charters, Kathleen G.

2012-01-01

Medical devices designed to network can share data with a Clinical Information System (CIS), making that data available within clinician workflow. Some lessons learned by transitioning anesthesia reporting and monitoring devices (ARMDs) on a local area network (LAN) to integration of anesthesia documentation within a CIS include the following categories: access, contracting, deployment, implementation, planning, security, support, training and workflow integration. Areas identified for improvement include: Vendor requirements for access reconciled with the organizations’ security policies and procedures. Include clauses supporting transition from stand-alone devices to information integrated into clinical workflow in the medical device procurement contract. Resolve deployment and implementation barriers that make the process less efficient and more costly. Include effective field communication and creative alternatives in planning. Build training on the baseline knowledge of trainees. Include effective help desk processes and metrics. Have a process for determining where problems originate when systems share information. PMID:24199054
Integrating pathology and radiology disciplines: an emerging opportunity?

PubMed Central

2012-01-01

Pathology and radiology form the core of cancer diagnosis, yet the workflows of both specialties remain ad hoc and occur in separate "silos," with no direct linkage between their case accessioning and/or reporting systems, even when both departments belong to the same host institution. Because both radiologists' and pathologists' data are essential to making correct diagnoses and appropriate patient management and treatment decisions, this isolation of radiology and pathology workflows can be detrimental to the quality and outcomes of patient care. These detrimental effects underscore the need for pathology and radiology workflow integration and for systems that facilitate the synthesis of all data produced by both specialties. With the enormous technological advances currently occurring in both fields, the opportunity has emerged to develop an integrated diagnostic reporting system that supports both specialties and, therefore, improves the overall quality of patient care. PMID:22950414
Using AI and Semantic Web Technologies to attack Process Complexity in Open Systems

NASA Astrophysics Data System (ADS)

Thompson, Simon; Giles, Nick; Li, Yang; Gharib, Hamid; Nguyen, Thuc Duong

Recently many vendors and groups have advocated using BPEL and WS-BPEL as a workflow language to encapsulate business logic. While encapsulating workflow and process logic in one place is a sensible architectural decision the implementation of complex workflows suffers from the same problems that made managing and maintaining hierarchical procedural programs difficult. BPEL lacks constructs for logical modularity such as the requirements construct from the STL [12] or the ability to adapt constructs like pure abstract classes for the same purpose. We describe a system that uses semantic web and agent concepts to implement an abstraction layer for BPEL based on the notion of Goals and service typing. AI planning was used to enable process engineers to create and validate systems that used services and goals as first class concepts and compiled processes at run time for execution.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.