Sample records for robust scientific workflows

  1. The future of scientific workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deelman, Ewa; Peterka, Tom; Altintas, Ilkay

    Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less

  2. Scientific workflows as productivity tools for drug discovery.

    PubMed

    Shon, John; Ohkawa, Hitomi; Hammer, Juergen

    2008-05-01

    Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.

  3. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

    DOE PAGES

    Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...

    2016-07-21

    The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less

  4. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hendrix, Valerie; Fox, James; Ghoshal, Devarshi

    The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less

  5. Scientific Workflow Management in Proteomics

    PubMed Central

    de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

    2012-01-01

    Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703

  6. Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

    DOE PAGES

    Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...

    2015-01-01

    This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less

  7. Scientific Workflows + Provenance = Better (Meta-)Data Management

    NASA Astrophysics Data System (ADS)

    Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

    2013-12-01

    The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata

  8. Conceptual-level workflow modeling of scientific experiments using NMR as a case study

    PubMed Central

    Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R

    2007-01-01

    Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using

  9. Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

    PubMed

    Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R

    2007-01-30

    Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.

  10. Science Gateways, Scientific Workflows and Open Community Software

    NASA Astrophysics Data System (ADS)

    Pierce, M. E.; Marru, S.

    2014-12-01

    Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing

  11. Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garzoglio, Gabriele

    The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.

  12. Widening the adoption of workflows to include human and human-machine scientific processes

    NASA Astrophysics Data System (ADS)

    Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

    2010-12-01

    Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.

  13. Integrate Data into Scientific Workflows for Terrestrial Biosphere Model Evaluation through Brokers

    NASA Astrophysics Data System (ADS)

    Wei, Y.; Cook, R. B.; Du, F.; Dasgupta, A.; Poco, J.; Huntzinger, D. N.; Schwalm, C. R.; Boldrini, E.; Santoro, M.; Pearlman, J.; Pearlman, F.; Nativi, S.; Khalsa, S.

    2013-12-01

    Terrestrial biosphere models (TBMs) have become integral tools for extrapolating local observations and process-level understanding of land-atmosphere carbon exchange to larger regions. Model-model and model-observation intercomparisons are critical to understand the uncertainties within model outputs, to improve model skill, and to improve our understanding of land-atmosphere carbon exchange. The DataONE Exploration, Visualization, and Analysis (EVA) working group is evaluating TBMs using scientific workflows in UV-CDAT/VisTrails. This workflow-based approach promotes collaboration and improved tracking of evaluation provenance. But challenges still remain. The multi-scale and multi-discipline nature of TBMs makes it necessary to include diverse and distributed data resources in model evaluation. These include, among others, remote sensing data from NASA, flux tower observations from various organizations including DOE, and inventory data from US Forest Service. A key challenge is to make heterogeneous data from different organizations and disciplines discoverable and readily integrated for use in scientific workflows. This presentation introduces the brokering approach taken by the DataONE EVA to fill the gap between TBMs' evaluation scientific workflows and cross-organization and cross-discipline data resources. The DataONE EVA started the development of an Integrated Model Intercomparison Framework (IMIF) that leverages standards-based discovery and access brokers to dynamically discover, access, and transform (e.g. subset and resampling) diverse data products from DataONE, Earth System Grid (ESG), and other data repositories into a format that can be readily used by scientific workflows in UV-CDAT/VisTrails. The discovery and access brokers serve as an independent middleware that bridge existing data repositories and TBMs evaluation scientific workflows but introduce little overhead to either component. In the initial work, an OpenSearch-based discovery broker

  14. The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)

    NASA Astrophysics Data System (ADS)

    Stephan, E.

    2010-12-01

    The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual

  15. An Integrated Framework for Parameter-based Optimization of Scientific Workflows.

    PubMed

    Kumar, Vijay S; Sadayappan, P; Mehta, Gaurang; Vahi, Karan; Deelman, Ewa; Ratnakar, Varun; Kim, Jihie; Gil, Yolanda; Hall, Mary; Kurc, Tahsin; Saltz, Joel

    2009-01-01

    Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.

  16. A robust scientific workflow for assessing fire danger levels using open-source software

    NASA Astrophysics Data System (ADS)

    Vitolo, Claudia; Di Giuseppe, Francesca; Smith, Paul

    2017-04-01

    Modelling forest fires is theoretically and computationally challenging because it involves the use of a wide variety of information, in large volumes and affected by high uncertainties. In-situ observations of wildfire, for instance, are highly sparse and need to be complemented by remotely sensed data measuring biomass burning to achieve homogeneous coverage at global scale. Fire models use weather reanalysis products to measure energy release and rate of spread but can only assess the potential predictability of fire danger as the actual ignition is due to human behaviour and, therefore, very unpredictable. Lastly, fire forecasting systems rely on weather forecasts to extend the advance warning but are currently calibrated using fire danger thresholds that are defined at global scale and do not take into account the spatial variability of fuel availability. As a consequence, uncertainties sharply increase cascading from the observational to the modelling stage and they might be further inflated by non-reproducible analyses. Although uncertainties in observations will only decrease with technological advances over the next decades, the other uncertainties (i.e. generated during modelling and post-processing) can already be addressed by developing transparent and reproducible analysis workflows, even more if implemented within open-source initiatives. This is because reproducible workflows aim to streamline the processing task as they present ready-made solutions to handle and manipulate complex and heterogeneous datasets. Also, opening the code to the scrutiny of other experts increases the chances to implement more robust solutions and avoids duplication of efforts. In this work we present our contribution to the forest fire modelling community: an open-source tool called "caliver" for the calibration and verification of forest fire model results. This tool is developed in the R programming language and publicly available under an open license. We will present

  17. Web-Accessible Scientific Workflow System for Performance Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roelof Versteeg; Roelof Versteeg; Trevor Rowe

    2006-03-01

    We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less

  18. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    NASA Astrophysics Data System (ADS)

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-12-01

    The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. We have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC); the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In this paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.

  19. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    DOE PAGES

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; ...

    2015-12-23

    The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally,more » it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.« less

  20. Chang'E-3 data pre-processing system based on scientific workflow

    NASA Astrophysics Data System (ADS)

    tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

    2016-04-01

    The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.

  1. A scientific workflow framework for (13)C metabolic flux analysis.

    PubMed

    Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

    2016-08-20

    Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

    NASA Astrophysics Data System (ADS)

    Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

    2011-12-01

    Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the

  3. Scientific Workflows and the Sensor Web for Virtual Environmental Observatories

    NASA Astrophysics Data System (ADS)

    Simonis, I.; Vahed, A.

    2008-12-01

    interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.

  4. A virtual data language and system for scientific workflow management in data grid environments

    NASA Astrophysics Data System (ADS)

    Zhao, Yong

    With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.

  5. Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

    NASA Astrophysics Data System (ADS)

    Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

    2017-04-01

    Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high

  6. Cancer Diagnosis Epigenomics Scientific Workflow Scheduling in the Cloud Computing Environment Using an Improved PSO Algorithm

    PubMed

    N, Sadhasivam; R, Balamurugan; M, Pandi

    2018-01-27

    Objective: Epigenetic modifications involving DNA methylation and histone statud are responsible for the stable maintenance of cellular phenotypes. Abnormalities may be causally involved in cancer development and therefore could have diagnostic potential. The field of epigenomics refers to all epigenetic modifications implicated in control of gene expression, with a focus on better understanding of human biology in both normal and pathological states. Epigenomics scientific workflow is essentially a data processing pipeline to automate the execution of various genome sequencing operations or tasks. Cloud platform is a popular computing platform for deploying large scale epigenomics scientific workflow. Its dynamic environment provides various resources to scientific users on a pay-per-use billing model. Scheduling epigenomics scientific workflow tasks is a complicated problem in cloud platform. We here focused on application of an improved particle swam optimization (IPSO) algorithm for this purpose. Methods: The IPSO algorithm was applied to find suitable resources and allocate epigenomics tasks so that the total cost was minimized for detection of epigenetic abnormalities of potential application for cancer diagnosis. Result: The results showed that IPSO based task to resource mapping reduced total cost by 6.83 percent as compared to the traditional PSO algorithm. Conclusion: The results for various cancer diagnosis tasks showed that IPSO based task to resource mapping can achieve better costs when compared to PSO based mapping for epigenomics scientific application workflow. Creative Commons Attribution License

  7. The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

    PubMed

    Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

    2016-06-01

    This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.

  8. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles

  9. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  10. Facilitating Stewardship of scientific data through standards based workflows

    NASA Astrophysics Data System (ADS)

    Bastrakova, I.; Kemp, C.; Potter, A. K.

    2013-12-01

    scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.

  11. An Adaptable Seismic Data Format for Modern Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.

    2013-12-01

    Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.

  12. What Not To Do: Anti-patterns for Developing Scientific Workflow Software Components

    NASA Astrophysics Data System (ADS)

    Futrelle, J.; Maffei, A. R.; Sosik, H. M.; Gallager, S. M.; York, A.

    2013-12-01

    Scientific workflows promise to enable efficient scaling-up of researcher code to handle large datasets and workloads, as well as documentation of scientific processing via standardized provenance records, etc. Workflow systems and related frameworks for coordinating the execution of otherwise separate components are limited, however, in their ability to overcome software engineering design problems commonly encountered in pre-existing components, such as scripts developed externally by scientists in their laboratories. In practice, this often means that components must be rewritten or replaced in a time-consuming, expensive process. In the course of an extensive workflow development project involving large-scale oceanographic image processing, we have begun to identify and codify 'anti-patterns'--problematic design characteristics of software--that make components fit poorly into complex automated workflows. We have gone on to develop and document low-effort solutions and best practices that efficiently address the anti-patterns we have identified. The issues, solutions, and best practices can be used to evaluate and improve existing code, as well as guiding the development of new components. For example, we have identified a common anti-pattern we call 'batch-itis' in which a script fails and then cannot perform more work, even if that work is not precluded by the failure. The solution we have identified--removing unnecessary looping over independent units of work--is often easier to code than the anti-pattern, as it eliminates the need for complex control flow logic in the component. Other anti-patterns we have identified are similarly easy to identify and often easy to fix. We have drawn upon experience working with three science teams at Woods Hole Oceanographic Institution, each of which has designed novel imaging instruments and associated image analysis code. By developing use cases and prototypes within these teams, we have undertaken formal evaluations of

  13. Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

    2011-07-04

    A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less

  14. An open source workflow for 3D printouts of scientific data volumes

    NASA Astrophysics Data System (ADS)

    Loewe, P.; Klump, J. F.; Wickert, J.; Ludwig, M.; Frigeri, A.

    2013-12-01

    As the amount of scientific data continues to grow, researchers need new tools to help them visualize complex data. Immersive data-visualisations are helpful, yet fail to provide tactile feedback and sensory feedback on spatial orientation, as provided from tangible objects. The gap in sensory feedback from virtual objects leads to the development of tangible representations of geospatial information to solve real world problems. Examples are animated globes [1], interactive environments like tangible GIS [2], and on demand 3D prints. The production of a tangible representation of a scientific data set is one step in a line of scientific thinking, leading from the physical world into scientific reasoning and back: The process starts with a physical observation, or from a data stream generated by an environmental sensor. This data stream is turned into a geo-referenced data set. This data is turned into a volume representation which is converted into command sequences for the printing device, leading to the creation of a 3D printout. As a last, but crucial step, this new object has to be documented and linked to the associated metadata, and curated in long term repositories to preserve its scientific meaning and context. The workflow to produce tangible 3D data-prints from science data at the German Research Centre for Geosciences (GFZ) was implemented as a software based on the Free and Open Source Geoinformatics tools GRASS GIS and Paraview. The workflow was successfully validated in various application scenarios at GFZ using a RapMan printer to create 3D specimens of elevation models, geological underground models, ice penetrating radar soundings for planetology, and space time stacks for Tsunami model quality assessment. While these first pilot applications have demonstrated the feasibility of the overall approach [3], current research focuses on the provision of the workflow as Software as a Service (SAAS), thematic generalisation of information content and

  15. A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

    PubMed

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

    2014-01-01

    The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (p<0.05). It indicated that LPCC was more sensitive in detecting motion artifacts. MSEs of all derived parameters from the reserved data after the artifacts rejection were smaller than the variance of the noise. It suggested that influence of rejected artifacts was less than influence of noise on the precision of derived parameters. The proposed workflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (p<0.05). The proposed post-processing workflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post

  16. Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)

    NASA Astrophysics Data System (ADS)

    Langenhorst, A. R.; Balaji, V.; Yakovlev, A.

    2010-12-01

    A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability

  17. A Robust Post-Processing Workflow for Datasets with Motion Artifacts in Diffusion Kurtosis Imaging

    PubMed Central

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X.; Wan, Mingxi

    2014-01-01

    Purpose The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). Materials and methods The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). Results The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (p<0.05). It indicated that LPCC was more sensitive in detecting motion artifacts. MSEs of all derived parameters from the reserved data after the artifacts rejection were smaller than the variance of the noise. It suggested that influence of rejected artifacts was less than influence of noise on the precision of derived parameters. The proposed workflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (p<0.05). Conclusion The proposed post-processing workflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI

  18. WRF4SG: A Scientific Gateway for climate experiment workflows

    NASA Astrophysics Data System (ADS)

    Blanco, Carlos; Cofino, Antonio S.; Fernandez-Quiruelas, Valvanuz

    2013-04-01

    The Weather Research and Forecasting model (WRF) is a community-driven and public domain model widely used by the weather and climate communities. As opposite to other application-oriented models, WRF provides a flexible and computationally-efficient framework which allows solving a variety of problems for different time-scales, from weather forecast to climate change projection. Furthermore, WRF is also widely used as a research tool in modeling physics, dynamics, and data assimilation by the research community. Climate experiment workflows based on Weather Research and Forecasting (WRF) are nowadays among the one of the most cutting-edge applications. These workflows are complex due to both large storage and the huge number of simulations executed. In order to manage that, we have developed a scientific gateway (SG) called WRF for Scientific Gateway (WRF4SG) based on WS-PGRADE/gUSE and WRF4G frameworks to ease achieve WRF users needs (see [1] and [2]). WRF4SG provides services for different use cases that describe the different interactions between WRF users and the WRF4SG interface in order to show how to run a climate experiment. As WS-PGRADE/gUSE uses portlets (see [1]) to interact with users, its portlets will support these use cases. A typical experiment to be carried on by a WRF user will consist on a high-resolution regional re-forecast. These re-forecasts are common experiments used as input data form wind power energy and natural hazards (wind and precipitation fields). In the cases below, the user is able to access to different resources such as Grid due to the fact that WRF needs a huge amount of computing resources in order to generate useful simulations: * Resource configuration and user authentication: The first step is to authenticate on users' Grid resources by virtual organizations. After login, the user is able to select which virtual organization is going to be used by the experiment. * Data assimilation: In order to assimilate the data sources

  19. Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

    PubMed Central

    Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

  20. Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

    PubMed

    Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.

  1. RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

    PubMed Central

    Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

    2016-01-01

    As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971

  2. RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS.

    PubMed

    Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

    2017-01-01

    As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.

  3. A framework for streamlining research workflow in neuroscience and psychology

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691

  4. A Drupal-Based Collaborative Framework for Science Workflows

    NASA Astrophysics Data System (ADS)

    Pinheiro da Silva, P.; Gandara, A.

    2010-12-01

    Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between

  5. A Six‐Stage Workflow for Robust Application of Systems Pharmacology

    PubMed Central

    Gadkar, K; Kirouac, DC; Mager, DE; van der Graaf, PH

    2016-01-01

    Quantitative and systems pharmacology (QSP) is increasingly being applied in pharmaceutical research and development. One factor critical to the ultimate success of QSP is the establishment of commonly accepted language, technical criteria, and workflows. We propose an integrated workflow that bridges conceptual objectives with underlying technical detail to support the execution, communication, and evaluation of QSP projects. PMID:27299936

  6. DIaaS: Data-Intensive workflows as a service - Enabling easy composition and deployment of data-intensive workflows on Virtual Research Environments

    NASA Astrophysics Data System (ADS)

    Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.

    2016-12-01

    We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed

  7. The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

    NASA Astrophysics Data System (ADS)

    Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

    2012-12-01

    The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models

  8. EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow.

    PubMed

    Magalhães, Wagner C S; Araujo, Nathalia M; Leal, Thiago P; Araujo, Gilderlanio S; Viriato, Paula J S; Kehdy, Fernanda S; Costa, Gustavo N; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Tarazona-Santos, Eduardo; Rodrigues, Maíra R

    2018-06-14

    EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs ( info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources. © 2018 Magalhães et al.; Published by Cold Spring Harbor Laboratory Press.

  9. Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

    NASA Astrophysics Data System (ADS)

    Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.

    Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.

  10. Flexible workflow sharing and execution services for e-scientists

    NASA Astrophysics Data System (ADS)

    Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

    2013-04-01

    The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already

  11. From the desktop to the grid: scalable bioinformatics via workflow conversion.

    PubMed

    de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver

    2016-03-12

    Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We

  12. Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

    PubMed

    Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

    2016-01-01

    In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.

  13. A characterization of workflow management systems for extreme-scale applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  14. A characterization of workflow management systems for extreme-scale applications

    DOE PAGES

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

    2017-02-16

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  15. SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

    NASA Astrophysics Data System (ADS)

    Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

    2014-05-01

    Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely

  16. Implementing bioinformatic workflows within the bioextract server

    USDA-ARS?s Scientific Manuscript database

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  17. Developing a workflow to identify inconsistencies in volunteered geographic information: a phenological case study

    USGS Publications Warehouse

    Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.

    2015-01-01

    assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.

  18. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-06-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  19. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less

  20. A Scientific Workflow Platform for Generic and Scalable Object Recognition on Medical Images

    NASA Astrophysics Data System (ADS)

    Möller, Manuel; Tuot, Christopher; Sintek, Michael

    In the research project THESEUS MEDICO we aim at a system combining medical image information with semantic background knowledge from ontologies to give clinicians fully cross-modal access to biomedical image repositories. Therefore joint efforts have to be made in more than one dimension: Object detection processes have to be specified in which an abstraction is performed starting from low-level image features across landmark detection utilizing abstract domain knowledge up to high-level object recognition. We propose a system based on a client-server extension of the scientific workflow platform Kepler that assists the collaboration of medical experts and computer scientists during development and parameter learning.

  1. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deelman, Ewa; Carothers, Christopher; Mandal, Anirban

    Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less

  2. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

    DOE PAGES

    Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...

    2015-07-14

    Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less

  3. Metaworkflows and Workflow Interoperability for Heliophysics

    NASA Astrophysics Data System (ADS)

    Pierantoni, Gabriele; Carley, Eoin P.

    2014-06-01

    Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They

  4. Workflow based framework for life science informatics.

    PubMed

    Tiwari, Abhishek; Sekhar, Arvind K T

    2007-10-01

    Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.

  5. Provenance-Powered Automatic Workflow Generation and Composition

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

    2015-12-01

    In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.

  6. A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

    PubMed

    Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias

    2011-03-21

    Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users “bundles of features” consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.

  7. A Practitioner Friendly and Scientifically Robust Training Evaluation Approach

    ERIC Educational Resources Information Center

    Griffin, Richard

    2012-01-01

    Purpose: This article seeks to review the current state of workplace learning evaluation, to set out the rationale for evaluation along with the barriers that practitioners face when seeking to assess the effectiveness of training and development. Finally, it aims to propose a scientifically robust and practitioner friendly approach to evaluation.…

  8. The MPO system for automatic workflow documentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abla, G.; Coviello, E. N.; Flanagan, S. M.

    Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less

  9. The MPO system for automatic workflow documentation

    DOE PAGES

    Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

    2016-04-18

    Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less

  10. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

    PubMed

    Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results

  11. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

    PubMed Central

    Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results

  12. Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ludascher, Bertram; Altintas, Ilkay

    Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less

  13. Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2014-12-01

    The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent Cyber

  14. Detecting distant homologies on protozoans metabolic pathways using scientific workflows.

    PubMed

    da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta

    2010-01-01

    Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.

  15. Focus: a robust workflow for one-dimensional NMR spectral analysis.

    PubMed

    Alonso, Arnald; Rodríguez, Miguel A; Vinaixa, Maria; Tortosa, Raül; Correig, Xavier; Julià, Antonio; Marsal, Sara

    2014-01-21

    One-dimensional (1)H NMR represents one of the most commonly used analytical techniques in metabolomic studies. The increase in the number of samples analyzed as well as the technical improvements involving instrumentation and spectral acquisition demand increasingly accurate and efficient high-throughput data processing workflows. We present FOCUS, an integrated and innovative methodology that provides a complete data analysis workflow for one-dimensional NMR-based metabolomics. This tool will allow users to easily obtain a NMR peak feature matrix ready for chemometric analysis as well as metabolite identification scores for each peak that greatly simplify the biological interpretation of the results. The algorithm development has been focused on solving the critical difficulties that appear at each data processing step and that can dramatically affect the quality of the results. As well as method integration, simplicity has been one of the main objectives in FOCUS development, requiring very little user input to perform accurate peak alignment, peak picking, and metabolite identification. The new spectral alignment algorithm, RUNAS, allows peak alignment with no need of a reference spectrum, and therefore, it reduces the bias introduced by other alignment approaches. Spectral alignment has been tested against previous methodologies obtaining substantial improvements in the case of moderate or highly unaligned spectra. Metabolite identification has also been significantly improved, using the positional and correlation peak patterns in contrast to a reference metabolite panel. Furthermore, the complete workflow has been tested using NMR data sets from 60 human urine samples and 120 aqueous liver extracts, reaching a successful identification of 42 metabolites from the two data sets. The open-source software implementation of this methodology is available at http://www.urr.cat/FOCUS.

  16. Knowledge Annotations in Scientific Workflows: An Implementation in Kepler

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo

    2011-07-20

    Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less

  17. GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation models

    NASA Astrophysics Data System (ADS)

    Fischer, T.; Naumov, D.; Sattler, S.; Kolditz, O.; Walther, M.

    2015-11-01

    We offer a versatile workflow to convert geological models built with the ParadigmTM GOCAD© (Geological Object Computer Aided Design) software into the open-source VTU (Visualization Toolkit unstructured grid) format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed as a way of communication between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of environmental studies. With two application examples in the Thuringian Syncline, we show how a heterogeneous geological GOCAD model including multiple layers and faults can be used for numerical groundwater flow modeling, in our case employing the OpenGeoSys open-source numerical toolbox for groundwater flow simulations. The presented workflow offers the chance to incorporate increasingly detailed data, utilizing the growing availability of computational power to simulate numerical models.

  18. Web-video-mining-supported workflow modeling for laparoscopic surgeries.

    PubMed

    Liu, Rui; Zhang, Xiaoli; Zhang, Hao

    2016-11-01

    As quality assurance is of strong concern in advanced surgeries, intelligent surgical systems are expected to have knowledge such as the knowledge of the surgical workflow model (SWM) to support their intuitive cooperation with surgeons. For generating a robust and reliable SWM, a large amount of training data is required. However, training data collected by physically recording surgery operations is often limited and data collection is time-consuming and labor-intensive, severely influencing knowledge scalability of the surgical systems. The objective of this research is to solve the knowledge scalability problem in surgical workflow modeling with a low cost and labor efficient way. A novel web-video-mining-supported surgical workflow modeling (webSWM) method is developed. A novel video quality analysis method based on topic analysis and sentiment analysis techniques is developed to select high-quality videos from abundant and noisy web videos. A statistical learning method is then used to build the workflow model based on the selected videos. To test the effectiveness of the webSWM method, 250 web videos were mined to generate a surgical workflow for the robotic cholecystectomy surgery. The generated workflow was evaluated by 4 web-retrieved videos and 4 operation-room-recorded videos, respectively. The evaluation results (video selection consistency n-index ≥0.60; surgical workflow matching degree ≥0.84) proved the effectiveness of the webSWM method in generating robust and reliable SWM knowledge by mining web videos. With the webSWM method, abundant web videos were selected and a reliable SWM was modeled in a short time with low labor cost. Satisfied performances in mining web videos and learning surgery-related knowledge show that the webSWM method is promising in scaling knowledge for intelligent surgical systems. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Dynamic reusable workflows for ocean science

    USGS Publications Warehouse

    Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

    2016-01-01

    Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic

  20. It's All About the Data: Workflow Systems and Weather

    NASA Astrophysics Data System (ADS)

    Plale, B.

    2009-05-01

    Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible

  1. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R

  2. Resilient workflows for computational mechanics platforms

    NASA Astrophysics Data System (ADS)

    Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

    2010-06-01

    Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].

  3. Big data analytics workflow management for eScience

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

    2015-04-01

    In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the

  4. FRIEDA: Flexible Robust Intelligent Elastic Data Management Framework

    DOE PAGES

    Ghoshal, Devarshi; Hendrix, Valerie; Fox, William; ...

    2017-02-01

    Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance, cost trade-offs, complex application choices and complexity associated with elasticity, failure rates in these environments. The different data access patterns for data-intensive scientific applications require a more flexible and robust data management solution than the ones currently in existence. FRIEDA is a Flexible Robust Intelligent Elastic Data Management framework that employs a range of data management strategies in cloud environments. FRIEDA can manage storage and data lifecyclemore » of applications in cloud environments. There are four different stages in the data management lifecycle of FRIEDA – (i) storage planning, (ii) provisioning and preparation, (iii) data placement, and (iv) execution. FRIEDA defines a data control plane and an execution plane. The data control plane defines the data partition and distribution strategy, whereas the execution plane manages the execution of the application using a master-worker paradigm. FRIEDA also provides different data management strategies, either to partition the data in real-time, or predetermine the data partitions prior to application execution.« less

  5. FRIEDA: Flexible Robust Intelligent Elastic Data Management Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghoshal, Devarshi; Hendrix, Valerie; Fox, William

    Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance, cost trade-offs, complex application choices and complexity associated with elasticity, failure rates in these environments. The different data access patterns for data-intensive scientific applications require a more flexible and robust data management solution than the ones currently in existence. FRIEDA is a Flexible Robust Intelligent Elastic Data Management framework that employs a range of data management strategies in cloud environments. FRIEDA can manage storage and data lifecyclemore » of applications in cloud environments. There are four different stages in the data management lifecycle of FRIEDA – (i) storage planning, (ii) provisioning and preparation, (iii) data placement, and (iv) execution. FRIEDA defines a data control plane and an execution plane. The data control plane defines the data partition and distribution strategy, whereas the execution plane manages the execution of the application using a master-worker paradigm. FRIEDA also provides different data management strategies, either to partition the data in real-time, or predetermine the data partitions prior to application execution.« less

  6. Structured recording of intraoperative surgical workflows

    NASA Astrophysics Data System (ADS)

    Neumuth, T.; Durstewitz, N.; Fischer, M.; Strauss, G.; Dietz, A.; Meixensberger, J.; Jannin, P.; Cleary, K.; Lemke, H. U.; Burgert, O.

    2006-03-01

    Surgical Workflows are used for the methodical and scientific analysis of surgical interventions. The approach described here is a step towards developing surgical assist systems based on Surgical Workflows and integrated control systems for the operating room of the future. This paper describes concepts and technologies for the acquisition of Surgical Workflows by monitoring surgical interventions and their presentation. Establishing systems which support the Surgical Workflow in operating rooms requires a multi-staged development process beginning with the description of these workflows. A formalized description of surgical interventions is needed to create a Surgical Workflow. This description can be used to analyze and evaluate surgical interventions in detail. We discuss the subdivision of surgical interventions into work steps regarding different levels of granularity and propose a recording scheme for the acquisition of manual surgical work steps from running interventions. To support the recording process during the intervention, we introduce a new software architecture. Core of the architecture is our Surgical Workflow editor that is intended to deal with the manifold, complex and concurrent relations during an intervention. Furthermore, a method for an automatic generation of graphs is shown which is able to display the recorded surgical work steps of the interventions. Finally we conclude with considerations about extensions of our recording scheme to close the gap to S-PACS systems. The approach was used to record 83 surgical interventions from 6 intervention types from 3 different surgical disciplines: ENT surgery, neurosurgery and interventional radiology. The interventions were recorded at the University Hospital Leipzig, Germany and at the Georgetown University Hospital, Washington, D.C., USA.

  7. wft4galaxy: a workflow testing tool for galaxy.

    PubMed

    Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi

    2017-12-01

    Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.

  8. Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

    DOE PAGES

    Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...

    2016-10-06

    The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less

  9. An ontology-based framework for bioinformatics workflows.

    PubMed

    Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

    2007-01-01

    The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.

  10. Using Kepler for Tool Integration in Microarray Analysis Workflows.

    PubMed

    Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

  11. A practical workflow for making anatomical atlases for biological research.

    PubMed

    Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles

    2012-01-01

    The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.

  12. SWARM : a scientific workflow for supporting Bayesian approaches to improve metabolic models.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, X.; Stevens, R.; Mathematics and Computer Science

    2008-01-01

    With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM dealsmore » with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.« less

  13. A framework for integration of scientific applications into the OpenTopography workflow

    NASA Astrophysics Data System (ADS)

    Nandigam, V.; Crosby, C.; Baru, C.

    2012-12-01

    The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by

  14. Nationwide Buildings Energy Research enabled through an integrated Data Intensive Scientific Workflow and Advanced Analysis Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleese van Dam, Kerstin; Lansing, Carina S.; Elsethagen, Todd O.

    2014-01-28

    Modern workflow systems enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve, due to the inherent resource requirements needed for the modeling work. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we will describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create andmore » execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The integration of simulation and analysis environments is hereby not only a question of ease of use, but supports fundamental functions in the correlated analysis of simulation input, execution details and derived results for multi-variant, complex studies. To this end the team extended and integrated the existing capabilities of the Velo data management and analysis infrastructure, the MeDICi data intensive workflow system and RHIPE the R for Hadoop version of the well-known statistics package, as well as developing a new visual analytics interface for the result exploitation by multi-domain users. The capabilities of the new environment are demonstrated on a use case that focusses on the Pacific Northwest National Laboratory (PNNL) building energy team, showing how they were able to take their previously local scale simulations to a nationwide level by utilizing data intensive computing techniques not only for their modeling work, but also for the subsequent analysis of their modeling results. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3 year study of building energy demands for the US

  15. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tsai, Yingssu; Stanford University, 333 Campus Drive, Mudd Building, Stanford, CA 94305-5080; McPhillips, Scott E.

    New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data,more » performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and

  16. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    PubMed Central

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  17. Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ludascher, Bertram; Altintas, Ilkay

    Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.

  18. Climate Data Analytics Workflow Management

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, S.; Pan, L.; Mattmann, C. A.; Lee, T. J.

    2016-12-01

    In this project we aim to pave a novel path to create a sustainable building block toward Earth science big data analytics and knowledge sharing. Closely studying how Earth scientists conduct data analytics research in their daily work, we have developed a provenance model to record their activities, and to develop a technology to automatically generate workflows for scientists from the provenance. On top of it, we have built the prototype of a data-centric provenance repository, and establish a PDSW (People, Data, Service, Workflow) knowledge network to support workflow recommendation. To ensure the scalability and performance of the expected recommendation system, we have leveraged the Apache OODT system technology. The community-approved, metrics-based performance evaluation web-service will allow a user to select a metric from the list of several community-approved metrics and to evaluate model performance using the metric as well as the reference dataset. This service will facilitate the use of reference datasets that are generated in support of the model-data intercomparison projects such as Obs4MIPs and Ana4MIPs. The data-centric repository infrastructure will allow us to catch richer provenance to further facilitate knowledge sharing and scientific collaboration in the Earth science community. This project is part of Apache incubator CMDA project.

  19. Real-Time Field Data Acquisition and Remote Sensor Reconfiguration Using Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Silva, F.; Mehta, G.; Vahi, K.; Deelman, E.

    2010-12-01

    Despite many technological advances, field data acquisition still consists of several manual and laborious steps. Once sensors and data loggers are deployed in the field, scientists often have to periodically return to their study sites in order to collect their data. Even when field deployments have a way to communicate and transmit data back to the laboratory (e.g. by using a satellite or a cellular modem), data analysis still requires several repetitive steps. Because data often needs to be processed and inspected manually, there is usually a significant time delay between data collection and analysis. As a result, sensor failures that could be detected almost in real-time are not noted for weeks or months. Finally, sensor reconfiguration as a result of interesting events in the field is still done manually, making rapid response nearly impossible and causing important data to be missed. By working closely with scientists from different application domains, we identified several tasks that, if automated, could greatly improve the way field data is collected, processed, and distributed. Our goals are to enable real-time data collection and validation, automate sensor reconfiguration in response to interest events in the field, and allow scientists to easily automate their data processing. We began our design by employing the Sensor Processing and Acquisition Network (SPAN) architecture. SPAN uses an embedded processor in the field to coordinate sensor data acquisition from analog and digital sensors by interfacing with different types of devices and data loggers. SPAN is also able to interact with various types of communication devices in order to provide real-time communication to and from field sites. We use the Pegasus Workflow Management System (Pegasus WMS) to coordinate data collection and control sensors and deployments in the field. Because scientific workflows can be used to automate multi-step, repetitive tasks, scientists can create simple workflows to

  20. A web accessible scientific workflow system for vadoze zone performance monitoring: design and implementation examples

    NASA Astrophysics Data System (ADS)

    Mattson, E.; Versteeg, R.; Ankeny, M.; Stormberg, G.

    2005-12-01

    Long term performance monitoring has been identified by DOE, DOD and EPA as one of the most challenging and costly elements of contaminated site remedial efforts. Such monitoring should provide timely and actionable information relevant to a multitude of stakeholder needs. This information should be obtained in a manner which is auditable, cost effective and transparent. Over the last several years INL staff has designed and implemented a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition from diverse sensors (geophysical, geochemical and hydrological) with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic javascript and html/css) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This system has been implemented and is operational for several sites, including the Ruby Gulch Waste Rock Repository (a capped mine waste rock dump on the Gilt Edge Mine Superfund Site), the INL Vadoze Zone Research Park and an alternative cover landfill. Implementations for other vadoze zone sites are currently in progress. These systems allow for autonomous performance monitoring through automated data analysis and report generation. This performance monitoring has allowed users to obtain insights into system dynamics, regulatory compliance and residence times of water. Our system uses modular components for data selection and graphing and WSDL compliant webservices for external functions such as statistical analyses and model invocations. Thus, implementing this system for novel sites and extending functionality (e.g. adding novel models) is relatively straightforward. As system access requires a standard webbrowser

  1. Deploying and sharing U-Compare workflows as web services.

    PubMed

    Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia

    2013-02-18

    U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.

  2. Deploying and sharing U-Compare workflows as web services

    PubMed Central

    2013-01-01

    Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017

  3. Addressing informatics challenges in Translational Research with workflow technology.

    PubMed

    Beaulah, Simon A; Correll, Mick A; Munro, Robin E J; Sheldon, Jonathan G

    2008-09-01

    Interest in Translational Research has been growing rapidly in recent years. In this collision of different data, technologies and cultures lie tremendous opportunities for the advancement of science and business for organisations that are able to integrate, analyse and deliver this information effectively to users. Workflow-based integration and analysis systems are becoming recognised as a fast and flexible way to build applications that are tailored to scientific areas, yet are built on a common platform. Workflow systems are allowing organisations to meet the key informatics challenges in Translational Research and improve disease understanding and patient care.

  4. The medical simulation markup language - simplifying the biomechanical modeling workflow.

    PubMed

    Suwelack, Stefan; Stoll, Markus; Schalck, Sebastian; Schoch, Nicolai; Dillmann, Rüdiger; Bendl, Rolf; Heuveline, Vincent; Speidel, Stefanie

    2014-01-01

    Modeling and simulation of the human body by means of continuum mechanics has become an important tool in diagnostics, computer-assisted interventions and training. This modeling approach seeks to construct patient-specific biomechanical models from tomographic data. Usually many different tools such as segmentation and meshing algorithms are involved in this workflow. In this paper we present a generalized and flexible description for biomechanical models. The unique feature of the new modeling language is that it not only describes the final biomechanical simulation, but also the workflow how the biomechanical model is constructed from tomographic data. In this way, the MSML can act as a middleware between all tools used in the modeling pipeline. The MSML thus greatly facilitates the prototyping of medical simulation workflows for clinical and research purposes. In this paper, we not only detail the XML-based modeling scheme, but also present a concrete implementation. Different examples highlight the flexibility, robustness and ease-of-use of the approach.

  5. Support for Taverna workflows in the VPH-Share cloud platform.

    PubMed

    Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F

    2017-07-01

    To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi

    As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less

  7. Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

    NASA Astrophysics Data System (ADS)

    Harris, A. T.; Ramachandran, R.; Maskey, M.

    2013-12-01

    The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL

  8. Enabling scientific workflows in virtual reality

    USGS Publications Warehouse

    Kreylos, O.; Bawden, G.; Bernardin, T.; Billen, M.I.; Cowgill, E.S.; Gold, R.D.; Hamann, B.; Jadamec, M.; Kellogg, L.H.; Staadt, O.G.; Sumner, D.Y.

    2006-01-01

    To advance research and improve the scientific return on data collection and interpretation efforts in the geosciences, we have developed methods of interactive visualization, with a special focus on immersive virtual reality (VR) environments. Earth sciences employ a strongly visual approach to the measurement and analysis of geologic data due to the spatial and temporal scales over which such data ranges, As observations and simulations increase in size and complexity, the Earth sciences are challenged to manage and interpret increasing amounts of data. Reaping the full intellectual benefits of immersive VR requires us to tailor exploratory approaches to scientific problems. These applications build on the visualization method's strengths, using both 3D perception and interaction with data and models, to take advantage of the skills and training of the geological scientists exploring their data in the VR environment. This interactive approach has enabled us to develop a suite of tools that are adaptable to a range of problems in the geosciences and beyond. Copyright ?? 2008 by the Association for Computing Machinery, Inc.

  9. Validation of the Applied Biosystems RapidFinder Shiga Toxin-Producing E. coli (STEC) Detection Workflow.

    PubMed

    Cloke, Jonathan; Matheny, Sharon; Swimley, Michelle; Tebbs, Robert; Burrell, Angelia; Flannery, Jonathan; Bastin, Benjamin; Bird, Patrick; Benzinger, M Joseph; Crowley, Erin; Agin, James; Goins, David; Salfinger, Yvonne; Brodsky, Michael; Fernandez, Maria Cristina

    2016-11-01

    The Applied Biosystems™ RapidFinder™ STEC Detection Workflow (Thermo Fisher Scientific) is a complete protocol for the rapid qualitative detection of Escherichia coli (E. coli) O157:H7 and the "Big 6" non-O157 Shiga-like toxin-producing E. coli (STEC) serotypes (defined as serogroups: O26, O45, O103, O111, O121, and O145). The RapidFinder STEC Detection Workflow makes use of either the automated preparation of PCR-ready DNA using the Applied Biosystems PrepSEQ™ Nucleic Acid Extraction Kit in conjunction with the Applied Biosystems MagMAX™ Express 96-well magnetic particle processor or the Applied Biosystems PrepSEQ Rapid Spin kit for manual preparation of PCR-ready DNA. Two separate assays comprise the RapidFinder STEC Detection Workflow, the Applied Biosystems RapidFinder STEC Screening Assay and the Applied Biosystems RapidFinder STEC Confirmation Assay. The RapidFinder STEC Screening Assay includes primers and probes to detect the presence of stx1 (Shiga toxin 1), stx2 (Shiga toxin 2), eae (intimin), and E. coli O157 gene targets. The RapidFinder STEC Confirmation Assay includes primers and probes for the "Big 6" non-O157 STEC and E. coli O157:H7. The use of these two assays in tandem allows a user to detect accurately the presence of the "Big 6" STECs and E. coli O157:H7. The performance of the RapidFinder STEC Detection Workflow was evaluated in a method comparison study, in inclusivity and exclusivity studies, and in a robustness evaluation. The assays were compared to the U.S. Department of Agriculture (USDA), Food Safety and Inspection Service (FSIS) Microbiology Laboratory Guidebook (MLG) 5.09: Detection, Isolation and Identification of Escherichia coli O157:H7 from Meat Products and Carcass and Environmental Sponges for raw ground beef (73% lean) and USDA/FSIS-MLG 5B.05: Detection, Isolation and Identification of Escherichia coli non-O157:H7 from Meat Products and Carcass and Environmental Sponges for raw beef trim. No statistically significant

  10. ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows

    NASA Technical Reports Server (NTRS)

    McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush

    2004-01-01

    With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.

  11. A Scientific Software Product Line for the Bioinformatics domain.

    PubMed

    Costa, Gabriella Castro B; Braga, Regina; David, José Maria N; Campos, Fernanda

    2015-08-01

    Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation

  12. Workflow management systems in radiology

    NASA Astrophysics Data System (ADS)

    Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim

    1998-07-01

    In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.

  13. The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

    NASA Astrophysics Data System (ADS)

    Peterson, F. S.; Barbieri, L.; Wyngaard, J.

    2015-12-01

    Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.

  14. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks

    PubMed Central

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G.; Hogg, David C.; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  15. geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

    NASA Astrophysics Data System (ADS)

    Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

    2015-12-01

    The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from

  16. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow.

    PubMed

    Kirwan, J A; Broadhurst, D I; Davidson, R L; Viant, M R

    2013-06-01

    Direct infusion mass spectrometry (DIMS)-based untargeted metabolomics measures many hundreds of metabolites in a single experiment. While every effort is made to reduce within-experiment analytical variation in untargeted metabolomics, unavoidable sources of measurement error are introduced. This is particularly true for large-scale multi-batch experiments, necessitating the development of robust workflows that minimise batch-to-batch variation. Here, we conducted a purpose-designed, eight-batch DIMS metabolomics study using nanoelectrospray (nESI) Fourier transform ion cyclotron resonance mass spectrometric analyses of mammalian heart extracts. First, we characterised the intrinsic analytical variation of this approach to determine whether our existing workflows are fit for purpose when applied to a multi-batch investigation. Batch-to-batch variation was readily observed across the 7-day experiment, both in terms of its absolute measurement using quality control (QC) and biological replicate samples, as well as its adverse impact on our ability to discover significant metabolic information within the data. Subsequently, we developed and implemented a computational workflow that includes total-ion-current filtering, QC-robust spline batch correction and spectral cleaning, and provide conclusive evidence that this workflow reduces analytical variation and increases the proportion of significant peaks. We report an overall analytical precision of 15.9%, measured as the median relative standard deviation (RSD) for the technical replicates of the biological samples, across eight batches and 7 days of measurements. When compared against the FDA guidelines for biomarker studies, which specify an RSD of <20% as an acceptable level of precision, we conclude that our new workflows are fit for purpose for large-scale, high-throughput nESI DIMS metabolomics studies.

  17. Workflows for Full Waveform Inversions

    NASA Astrophysics Data System (ADS)

    Boehm, Christian; Krischer, Lion; Afanasiev, Michael; van Driel, Martin; May, Dave A.; Rietmann, Max; Fichtner, Andreas

    2017-04-01

    Despite many theoretical advances and the increasing availability of high-performance computing clusters, full seismic waveform inversions still face considerable challenges regarding data and workflow management. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these often manpower-bounded issues that need to be overcome to facilitate further progress. Modern inversions involve huge amounts of data and require a tight integration between numerical PDE solvers, data acquisition and processing systems, nonlinear optimization libraries, and job orchestration frameworks. To this end we created a set of libraries and applications revolving around Salvus (http://salvus.io), a novel software package designed to solve large-scale full waveform inverse problems. This presentation focuses on solving passive source seismic full waveform inversions from local to global scales with Salvus. We discuss (i) design choices for the aforementioned components required for full waveform modeling and inversion, (ii) their implementation in the Salvus framework, and (iii) how it is all tied together by a usable workflow system. We combine state-of-the-art algorithms ranging from high-order finite-element solutions of the wave equation to quasi-Newton optimization algorithms using trust-region methods that can handle inexact derivatives. All is steered by an automated interactive graph-based workflow framework capable of orchestrating all necessary pieces. This naturally facilitates the creation of new Earth models and hopefully sparks new scientific insights. Additionally, and even more importantly, it enhances reproducibility and reliability of the final results.

  18. PeptidePicker: a scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments.

    PubMed

    Mohammed, Yassene; Domański, Dominik; Jackson, Angela M; Smith, Derek S; Deelder, André M; Palmblad, Magnus; Borchers, Christoph H

    2014-06-25

    One challenge in Multiple Reaction Monitoring (MRM)-based proteomics is to select the most appropriate surrogate peptides to represent a target protein. We present here a software package to automatically generate these most appropriate surrogate peptides for an LC/MRM-MS analysis. Our method integrates information about the proteins, their tryptic peptides, and the suitability of these peptides for MRM which is available online in UniProtKB, NCBI's dbSNP, ExPASy, PeptideAtlas, PRIDE, and GPMDB. The scoring algorithm reflects our knowledge in choosing the best candidate peptides for MRM, based on the uniqueness of the peptide in the targeted proteome, its physiochemical properties, and whether it previously has been observed. The modularity of the workflow allows further extension and additional selection criteria to be incorporated. We have developed a simple Web interface where the researcher provides the protein accession number, the subject organism, and peptide-specific options. Currently, the software is designed for human and mouse proteomes, but additional species can be easily be added. Our software improved the peptide selection by eliminating human error, considering multiple data sources and all of the isoforms of the protein, and resulted in faster peptide selection - approximately 50 proteins per hour compared to 8 per day. Compiling a list of optimal surrogate peptides for target proteins to be analyzed by LC/MRM-MS has been a cumbersome process, in which expert researchers retrieved information from different online repositories and used their own reasoning to find the most appropriate peptides. Our scientific workflow automates this process by integrating information from different data sources including UniProt, Global Proteome Machine, NCBI's dbSNP, and PeptideAtlas, simulating the researchers' reasoning, and incorporating their knowledge of how to select the best proteotypic peptides for an MRM analysis. The developed software can help to

  19. Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

    PubMed

    Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

    2017-12-01

    Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Eleven quick tips for architecting biomedical informatics workflows with cloud computing.

    PubMed

    Cole, Brian S; Moore, Jason H

    2018-03-01

    Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.

  1. Eleven quick tips for architecting biomedical informatics workflows with cloud computing

    PubMed Central

    Moore, Jason H.

    2018-01-01

    Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416

  2. The Live Access Server Scientific Product Generation Through Workflow Orchestration

    NASA Astrophysics Data System (ADS)

    Hankin, S.; Calahan, J.; Li, J.; Manke, A.; O'Brien, K.; Schweitzer, R.

    2006-12-01

    The Live Access Server (LAS) is a well-established Web-application for display and analysis of geo-science data sets. The software, which can be downloaded and installed by anyone, gives data providers an easy way to establish services for their on-line data holdings, so their users can make plots; create and download data sub-sets; compare (difference) fields; and perform simple analyses. Now at version 7.0, LAS has been in operation since 1994. The current "Armstrong" release of LAS V7 consists of three components in a tiered architecture: user interface, workflow orchestration and Web Services. The LAS user interface (UI) communicates with the LAS Product Server via an XML protocol embedded in an HTTP "get" URL. Libraries (APIs) have been developed in Java, JavaScript and perl that can readily generate this URL. As a result of this flexibility it is common to find LAS user interfaces of radically different character, tailored to the nature of specific datasets or the mindset of specific users. When a request is received by the LAS Product Server (LPS -- the workflow orchestration component), business logic converts this request into a series of Web Service requests invoked via SOAP. These "back- end" Web services perform data access and generate products (visualizations, data subsets, analyses, etc.). LPS then packages these outputs into final products (typically HTML pages) via Jakarta Velocity templates for delivery to the end user. "Fine grained" data access is performed by back-end services that may utilize JDBC for data base access; the OPeNDAP "DAPPER" protocol; or (in principle) the OGC WFS protocol. Back-end visualization services are commonly legacy science applications wrapped in Java or Python (or perl) classes and deployed as Web Services accessible via SOAP. Ferret is the default visualization application used by LAS, though other applications such as Matlab, CDAT, and GrADS can also be used. Other back-end services may include generation of Google

  3. Progress in digital color workflow understanding in the International Color Consortium (ICC) Workflow WG

    NASA Astrophysics Data System (ADS)

    McCarthy, Ann

    2006-01-01

    The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.

  4. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

    PubMed

    Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

    2005-04-07

    Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.

  5. Disruption of Radiologist Workflow.

    PubMed

    Kansagra, Akash P; Liu, Kevin; Yu, John-Paul J

    2016-01-01

    The effect of disruptions has been studied extensively in surgery and emergency medicine, and a number of solutions-such as preoperative checklists-have been implemented to enforce the integrity of critical safety-related workflows. Disruptions of the highly complex and cognitively demanding workflow of modern clinical radiology have only recently attracted attention as a potential safety hazard. In this article, we describe the variety of disruptions that arise in the reading room environment, review approaches that other specialties have taken to mitigate workflow disruption, and suggest possible solutions for workflow improvement in radiology. Copyright © 2015 Mosby, Inc. All rights reserved.

  6. A Community-Driven Workflow Recommendations and Reuse Infrastructure

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

    2013-12-01

    Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from

  7. The impact of missing sensor information on surgical workflow management.

    PubMed

    Liebmann, Philipp; Meixensberger, Jürgen; Wiedemann, Peter; Neumuth, Thomas

    2013-09-01

    Sensor systems in the operating room may encounter intermittent data losses that reduce the performance of surgical workflow management systems (SWFMS). Sensor data loss could impact SWFMS-based decision support, device parameterization, and information presentation. The purpose of this study was to understand the robustness of surgical process models when sensor information is partially missing. SWFMS changes caused by wrong or no data from the sensor system which tracks the progress of a surgical intervention were tested. The individual surgical process models (iSPMs) from 100 different cataract procedures of 3 ophthalmologic surgeons were used to select a randomized subset and create a generalized surgical process model (gSPM). A disjoint subset was selected from the iSPMs and used to simulate the surgical process against the gSPM. The loss of sensor data was simulated by removing some information from one task in the iSPM. The effect of missing sensor data was measured using several metrics: (a) successful relocation of the path in the gSPM, (b) the number of steps to find the converging point, and (c) the perspective with the highest occurrence of unsuccessful path findings. A gSPM built using 30% of the iSPMs successfully found the correct path in 90% of the cases. The most critical sensor data were the information regarding the instrument used by the surgeon. We found that use of a gSPM to provide input data for a SWFMS is robust and can be accurate despite missing sensor data. A surgical workflow management system can provide the surgeon with workflow guidance in the OR for most cases. Sensor systems for surgical process tracking can be evaluated based on the stability and accuracy of functional and spatial operative results.

  8. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-05-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be

  9. Development of HEATHER for cochlear implant stimulation using a new modeling workflow.

    PubMed

    Tran, Phillip; Sue, Andrian; Wong, Paul; Li, Qing; Carter, Paul

    2015-02-01

    The current conduction pathways resulting from monopolar stimulation of the cochlear implant were studied by developing a human electroanatomical total head reconstruction (namely, HEATHER). HEATHER was created from serially sectioned images of the female Visible Human Project dataset to encompass a total of 12 different tissues, and included computer-aided design geometries of the cochlear implant. Since existing methods were unable to generate the required complexity for HEATHER, a new modeling workflow was proposed. The results of the finite-element analysis agree with the literature, showing that the injected current exits the cochlea via the modiolus (14%), the basal end of the cochlea (22%), and through the cochlear walls (64%). It was also found that, once leaving the cochlea, the current travels to the implant body via the cranial cavity or scalp. The modeling workflow proved to be robust and flexible, allowing for meshes to be generated with substantial user control. Furthermore, the workflow could easily be employed to create realistic anatomical models of the human head for different bioelectric applications, such as deep brain stimulation, electroencephalography, and other biophysical phenomena.

  10. Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.

    PubMed

    Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela

    2017-04-07

    In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.

  11. Rethinking Clinical Workflow.

    PubMed

    Schlesinger, Joseph J; Burdick, Kendall; Baum, Sarah; Bellomy, Melissa; Mueller, Dorothee; MacDonald, Alistair; Chern, Alex; Chrouser, Kristin; Burger, Christie

    2018-03-01

    The concept of clinical workflow borrows from management and leadership principles outside of medicine. The only way to rethink clinical workflow is to understand the neuroscience principles that underlie attention and vigilance. With any implementation to improve practice, there are human factors that can promote or impede progress. Modulating the environment and working as a team to take care of patients is paramount. Clinicians must continually rethink clinical workflow, evaluate progress, and understand that other industries have something to offer. Then, novel approaches can be implemented to take the best care of patients. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Facilitating hydrological data analysis workflows in R: the RHydro package

    NASA Astrophysics Data System (ADS)

    Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

    2015-04-01

    The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges

  13. Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

    NASA Astrophysics Data System (ADS)

    Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

    2007-12-01

    Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access

  14. Grid workflow validation using ontology-based tacit knowledge: A case study for quantitative remote sensing applications

    NASA Astrophysics Data System (ADS)

    Liu, Jia; Liu, Longli; Xue, Yong; Dong, Jing; Hu, Yingcui; Hill, Richard; Guang, Jie; Li, Chi

    2017-01-01

    Workflow for remote sensing quantitative retrieval is the ;bridge; between Grid services and Grid-enabled application of remote sensing quantitative retrieval. Workflow averts low-level implementation details of the Grid and hence enables users to focus on higher levels of application. The workflow for remote sensing quantitative retrieval plays an important role in remote sensing Grid and Cloud computing services, which can support the modelling, construction and implementation of large-scale complicated applications of remote sensing science. The validation of workflow is important in order to support the large-scale sophisticated scientific computation processes with enhanced performance and to minimize potential waste of time and resources. To research the semantic correctness of user-defined workflows, in this paper, we propose a workflow validation method based on tacit knowledge research in the remote sensing domain. We first discuss the remote sensing model and metadata. Through detailed analysis, we then discuss the method of extracting the domain tacit knowledge and expressing the knowledge with ontology. Additionally, we construct the domain ontology with Protégé. Through our experimental study, we verify the validity of this method in two ways, namely data source consistency error validation and parameters matching error validation.

  15. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

    PubMed Central

    Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta

    2015-01-01

    Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http

  16. DEWEY: the DICOM-enabled workflow engine system.

    PubMed

    Erickson, Bradley J; Langer, Steve G; Blezek, Daniel J; Ryan, William J; French, Todd L

    2014-06-01

    Workflow is a widely used term to describe the sequence of steps to accomplish a task. The use of workflow technology in medicine and medical imaging in particular is limited. In this article, we describe the application of a workflow engine to improve workflow in a radiology department. We implemented a DICOM-enabled workflow engine system in our department. We designed it in a way to allow for scalability, reliability, and flexibility. We implemented several workflows, including one that replaced an existing manual workflow and measured the number of examinations prepared in time without and with the workflow system. The system significantly increased the number of examinations prepared in time for clinical review compared to human effort. It also met the design goals defined at its outset. Workflow engines appear to have value as ways to efficiently assure that complex workflows are completed in a timely fashion.

  17. Classical workflow nets and workflow nets with reset arcs: using Lyapunov stability for soundness verification

    NASA Astrophysics Data System (ADS)

    Clempner, Julio B.

    2017-01-01

    This paper presents a novel analytical method for soundness verification of workflow nets and reset workflow nets, using the well-known stability results of Lyapunov for Petri nets. We also prove that the soundness property is decidable for workflow nets and reset workflow nets. In addition, we provide evidence of several outcomes related with properties such as boundedness, liveness, reversibility and blocking using stability. Our approach is validated theoretically and by a numerical example related to traffic signal-control synchronisation.

  18. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-12-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of

  19. Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

    NASA Astrophysics Data System (ADS)

    Nguyen, P. T.; Chapman, D. R.; Halem, M.

    2012-12-01

    New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in

  20. PyDBS: an automated image processing workflow for deep brain stimulation surgery.

    PubMed

    D'Albis, Tiziano; Haegelen, Claire; Essert, Caroline; Fernández-Vidal, Sara; Lalys, Florent; Jannin, Pierre

    2015-02-01

    Deep brain stimulation (DBS) is a surgical procedure for treating motor-related neurological disorders. DBS clinical efficacy hinges on precise surgical planning and accurate electrode placement, which in turn call upon several image processing and visualization tasks, such as image registration, image segmentation, image fusion, and 3D visualization. These tasks are often performed by a heterogeneous set of software tools, which adopt differing formats and geometrical conventions and require patient-specific parameterization or interactive tuning. To overcome these issues, we introduce in this article PyDBS, a fully integrated and automated image processing workflow for DBS surgery. PyDBS consists of three image processing pipelines and three visualization modules assisting clinicians through the entire DBS surgical workflow, from the preoperative planning of electrode trajectories to the postoperative assessment of electrode placement. The system's robustness, speed, and accuracy were assessed by means of a retrospective validation, based on 92 clinical cases. The complete PyDBS workflow achieved satisfactory results in 92 % of tested cases, with a median processing time of 28 min per patient. The results obtained are compatible with the adoption of PyDBS in clinical practice.

  1. Coupling between a multi-physics workflow engine and an optimization framework

    NASA Astrophysics Data System (ADS)

    Di Gallo, L.; Reux, C.; Imbeaux, F.; Artaud, J.-F.; Owsiak, M.; Saoutic, B.; Aiello, G.; Bernardi, P.; Ciraolo, G.; Bucalossi, J.; Duchateau, J.-L.; Fausser, C.; Galassi, D.; Hertout, P.; Jaboulay, J.-C.; Li-Puma, A.; Zani, L.

    2016-03-01

    A generic coupling method between a multi-physics workflow engine and an optimization framework is presented in this paper. The coupling architecture has been developed in order to preserve the integrity of the two frameworks. The objective is to provide the possibility to replace a framework, a workflow or an optimizer by another one without changing the whole coupling procedure or modifying the main content in each framework. The coupling is achieved by using a socket-based communication library for exchanging data between the two frameworks. Among a number of algorithms provided by optimization frameworks, Genetic Algorithms (GAs) have demonstrated their efficiency on single and multiple criteria optimization. Additionally to their robustness, GAs can handle non-valid data which may appear during the optimization. Consequently GAs work on most general cases. A parallelized framework has been developed to reduce the time spent for optimizations and evaluation of large samples. A test has shown a good scaling efficiency of this parallelized framework. This coupling method has been applied to the case of SYCOMORE (SYstem COde for MOdeling tokamak REactor) which is a system code developed in form of a modular workflow for designing magnetic fusion reactors. The coupling of SYCOMORE with the optimization platform URANIE enables design optimization along various figures of merit and constraints.

  2. Nurturing reliable and robust open-source scientific software

    NASA Astrophysics Data System (ADS)

    Uieda, L.; Wessel, P.

    2017-12-01

    Scientific results are increasingly the product of software. The reproducibility and validity of published results cannot be ensured without access to the source code of the software used to produce them. Therefore, the code itself is a fundamental part of the methodology and must be published along with the results. With such a reliance on software, it is troubling that most scientists do not receive formal training in software development. Tools such as version control, continuous integration, and automated testing are routinely used in industry to ensure the correctness and robustness of software. However, many scientist do not even know of their existence (although efforts like Software Carpentry are having an impact on this issue; software-carpentry.org). Publishing the source code is only the first step in creating an open-source project. For a project to grow it must provide documentation, participation guidelines, and a welcoming environment for new contributors. Expanding the project community is often more challenging than the technical aspects of software development. Maintainers must invest time to enforce the rules of the project and to onboard new members, which can be difficult to justify in the context of the "publish or perish" mentality. This problem will continue as long as software contributions are not recognized as valid scholarship by hiring and tenure committees. Furthermore, there are still unsolved problems in providing attribution for software contributions. Many journals and metrics of academic productivity do not recognize citations to sources other than traditional publications. Thus, some authors choose to publish an article about the software and use it as a citation marker. One issue with this approach is that updating the reference to include new contributors involves writing and publishing a new article. A better approach would be to cite a permanent archive of individual versions of the source code in services such as Zenodo

  3. Building a Robust 21st Century Chemical Testing Program at the U.S. Environmental Protection Agency: Recommendations for Strengthening Scientific Engagement

    PubMed Central

    Dantzker, Heather C.; Portier, Christopher J.

    2014-01-01

    Background: Biological pathway-based chemical testing approaches are central to the National Research Council’s vision for 21st century toxicity testing. Approaches such as high-throughput in vitro screening offer the potential to evaluate thousands of chemicals faster and cheaper than ever before and to reduce testing on laboratory animals. Collaborative scientific engagement is important in addressing scientific issues arising in new federal chemical testing programs and for achieving stakeholder support of their use. Objectives: We present two recommendations specifically focused on increasing scientific engagement in the U.S. Environmental Protection Agency (EPA) ToxCast™ initiative. Through these recommendations we seek to bolster the scientific foundation of federal chemical testing efforts such as ToxCast™ and the public health decisions that rely upon them. Discussion: Environmental Defense Fund works across disciplines and with diverse groups to improve the science underlying environmental health decisions. We propose that the U.S. EPA can strengthen the scientific foundation of its new chemical testing efforts and increase support for them in the scientific research community by a) expanding and diversifying scientific input into the development and application of new chemical testing methods through collaborative workshops, and b) seeking out mutually beneficial research partnerships. Conclusions: Our recommendations provide concrete actions for the U.S. EPA to increase and diversify engagement with the scientific research community in its ToxCast™ initiative. We believe that such engagement will help ensure that new chemical testing data are scientifically robust and that the U.S. EPA gains the support and acceptance needed to sustain new testing efforts to protect public health. Citation: McPartland J, Dantzker HC, Portier CJ. 2015. Building a robust 21st century chemical testing program at the U.S. Environmental Protection Agency

  4. Open innovation: Towards sharing of data, models and workflows.

    PubMed

    Conrado, Daniela J; Karlsson, Mats O; Romero, Klaus; Sarr, Céline; Wilkins, Justin J

    2017-11-15

    Sharing of resources across organisations to support open innovation is an old idea, but which is being taken up by the scientific community at increasing speed, concerning public sharing in particular. The ability to address new questions or provide more precise answers to old questions through merged information is among the attractive features of sharing. Increased efficiency through reuse, and increased reliability of scientific findings through enhanced transparency, are expected outcomes from sharing. In the field of pharmacometrics, efforts to publicly share data, models and workflow have recently started. Sharing of individual-level longitudinal data for modelling requires solving legal, ethical and proprietary issues similar to many other fields, but there are also pharmacometric-specific aspects regarding data formats, exchange standards, and database properties. Several organisations (CDISC, C-Path, IMI, ISoP) are working to solve these issues and propose standards. There are also a number of initiatives aimed at collecting disease-specific databases - Alzheimer's Disease (ADNI, CAMD), malaria (WWARN), oncology (PDS), Parkinson's Disease (PPMI), tuberculosis (CPTR, TB-PACTS, ReSeqTB) - suitable for drug-disease modelling. Organized sharing of pharmacometric executable model code and associated information has in the past been sparse, but a model repository (DDMoRe Model Repository) intended for the purpose has recently been launched. In addition several other services can facilitate model sharing more generally. Pharmacometric workflows have matured over the last decades and initiatives to more fully capture those applied to analyses are ongoing. In order to maximize both the impact of pharmacometrics and the knowledge extracted from clinical data, the scientific community needs to take ownership of and create opportunities for open innovation. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials

    PubMed Central

    Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja

    2015-01-01

    Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437

  6. SYRMEP Tomo Project: a graphical user interface for customizing CT reconstruction workflows.

    PubMed

    Brun, Francesco; Massimi, Lorenzo; Fratini, Michela; Dreossi, Diego; Billé, Fulvio; Accardo, Agostino; Pugliese, Roberto; Cedola, Alessia

    2017-01-01

    When considering the acquisition of experimental synchrotron radiation (SR) X-ray CT data, the reconstruction workflow cannot be limited to the essential computational steps of flat fielding and filtered back projection (FBP). More refined image processing is often required, usually to compensate artifacts and enhance the quality of the reconstructed images. In principle, it would be desirable to optimize the reconstruction workflow at the facility during the experiment (beamtime). However, several practical factors affect the image reconstruction part of the experiment and users are likely to conclude the beamtime with sub-optimal reconstructed images. Through an example of application, this article presents SYRMEP Tomo Project (STP), an open-source software tool conceived to let users design custom CT reconstruction workflows. STP has been designed for post-beamtime (off-line use) and for a new reconstruction of past archived data at user's home institution where simple computing resources are available. Releases of the software can be downloaded at the Elettra Scientific Computing group GitHub repository https://github.com/ElettraSciComp/STP-Gui.

  7. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  8. An efficient field and laboratory workflow for plant phylotranscriptomic projects1

    PubMed Central

    Yang, Ya; Moore, Michael J.; Brockington, Samuel F.; Timoneda, Alfonso; Feng, Tao; Marx, Hannah E.; Walker, Joseph F.; Smith, Stephen A.

    2017-01-01

    Premise of the study: We describe a field and laboratory workflow developed for plant phylotranscriptomic projects that involves cryogenic tissue collection in the field, RNA extraction and quality control, and library preparation. We also make recommendations for sample curation. Methods and Results: A total of 216 frozen tissue samples of Caryophyllales and other angiosperm taxa were collected from the field or botanical gardens. RNA was extracted, stranded mRNA libraries were prepared, and libraries were sequenced on Illumina HiSeq platforms. These included difficult mucilaginous tissues such as those of Cactaceae and Droseraceae. Conclusions: Our workflow is not only cost effective (ca. $270 per sample, as of August 2016, from tissue to reads) and time efficient (less than 50 h for 10–12 samples including all laboratory work and sample curation), but also has proven robust for extraction of difficult samples such as tissues containing high levels of secondary compounds. PMID:28337391

  9. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

    PubMed Central

    Stockton, David B.; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175

  10. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.

  11. Pegasus Workflow Management System: Helping Applications From Earth and Space

    NASA Astrophysics Data System (ADS)

    Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.

    2010-12-01

    Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved

  12. Integrating visualization and interaction research to improve scientific workflows.

    PubMed

    Keefe, Daniel F

    2010-01-01

    Scientific-visualization research is, nearly by necessity, interdisciplinary. In addition to their collaborators in application domains (for example, cell biology), researchers regularly build on close ties with disciplines related to visualization, such as graphics, human-computer interaction, and cognitive science. One of these ties is the connection between visualization and interaction research. This isn't a new direction for scientific visualization (see the "Early Connections" sidebar). However, momentum recently seems to be increasing toward integrating visualization research (for example, effective visual presentation of data) with interaction research (for example, innovative interactive techniques that facilitate manipulating and exploring data). We see evidence of this trend in several places, including the visualization literature and conferences.

  13. a Standardized Approach to Topographic Data Processing and Workflow Management

    NASA Astrophysics Data System (ADS)

    Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.

    2013-12-01

    An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and

  14. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

    PubMed

    Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee

    2018-01-01

    Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Structuring Clinical Workflows for Diabetes Care

    PubMed Central

    Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.

    2014-01-01

    Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765

  16. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoo, Wucherl; Koo, Michelle; Cao, Yu

    Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less

  17. Scientific Utopia: An agenda for improving scientific communication (Invited)

    NASA Astrophysics Data System (ADS)

    Nosek, B.

    2013-12-01

    The scientist's primary incentive is publication. In the present culture, open practices do not increase chances of publication, and they often require additional work. Practicing the abstract scientific values of openness and reproducibility thus requires behaviors in addition to those relevant for the primary, concrete rewards. When in conflict, concrete rewards are likely to dominate over abstract ones. As a consequence, the reward structure for scientists does not encourage openness and reproducibility. This can be changed by nudging incentives to align scientific practices with scientific values. Science will benefit by creating and connecting technologies that nudge incentives while supporting and improving the scientific workflow. For example, it should be as easy to search the research literature for my topic as it is to search the Internet to find hilarious videos of cats falling off of furniture. I will introduce the Center for Open Science (http://centerforopenscience.org/) and efforts to improve openness and reproducibility such as http://openscienceframework.org/. There will be no cats.

  18. Generic worklist handler for workflow-enabled products

    NASA Astrophysics Data System (ADS)

    Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas

    1999-07-01

    Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.

  19. Managing and Communicating Operational Workflow

    PubMed Central

    Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

    2016-01-01

    Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407

  20. Understanding latent structures of clinical information logistics: A bottom-up approach for model building and validating the workflow composite score.

    PubMed

    Esdar, Moritz; Hübner, Ursula; Liebe, Jan-David; Hüsers, Jens; Thye, Johannes

    2017-01-01

    Clinical information logistics is a construct that aims to describe and explain various phenomena of information provision to drive clinical processes. It can be measured by the workflow composite score, an aggregated indicator of the degree of IT support in clinical processes. This study primarily aimed to investigate the yet unknown empirical patterns constituting this construct. The second goal was to derive a data-driven weighting scheme for the constituents of the workflow composite score and to contrast this scheme with a literature based, top-down procedure. This approach should finally test the validity and robustness of the workflow composite score. Based on secondary data from 183 German hospitals, a tiered factor analytic approach (confirmatory and subsequent exploratory factor analysis) was pursued. A weighting scheme, which was based on factor loadings obtained in the analyses, was put into practice. We were able to identify five statistically significant factors of clinical information logistics that accounted for 63% of the overall variance. These factors were "flow of data and information", "mobility", "clinical decision support and patient safety", "electronic patient record" and "integration and distribution". The system of weights derived from the factor loadings resulted in values for the workflow composite score that differed only slightly from the score values that had been previously published based on a top-down approach. Our findings give insight into the internal composition of clinical information logistics both in terms of factors and weights. They also allowed us to propose a coherent model of clinical information logistics from a technical perspective that joins empirical findings with theoretical knowledge. Despite the new scheme of weights applied to the calculation of the workflow composite score, the score behaved robustly, which is yet another hint of its validity and therefore its usefulness. Copyright © 2016 Elsevier Ireland

  1. A workflow learning model to improve geovisual analytics utility

    PubMed Central

    Roth, Robert E; MacEachren, Alan M; McCabe, Craig A

    2011-01-01

    concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545

  2. A workflow learning model to improve geovisual analytics utility.

    PubMed

    Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

    2009-01-01

    the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.

  3. Radiology information system: a workflow-based approach.

    PubMed

    Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P

    2009-09-01

    Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.

  4. Data Integration Tool: From Permafrost Data Translation Research Tool to A Robust Research Application

    NASA Astrophysics Data System (ADS)

    Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Strawhacker, C.; Pulsifer, P. L.; Thurmes, N.

    2016-12-01

    The United States National Science Foundation funded PermaData project led by the National Snow and Ice Data Center (NSIDC) with a team from the Global Terrestrial Network for Permafrost (GTN-P) aimed to improve permafrost data access and discovery. We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the GTN-P. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets. Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs. Originally it was written to capture a scientist's personal, iterative, data manipulation and quality control process of visually and programmatically iterating through inconsistent input data, examining it to find problems, adding operations to address the problems, and rerunning until the data could be translated into the GTN-P standard format. Iterative development of this tool led to a Fortran/Python hybrid then, with consideration of users, licensing, version control, packaging, and workflow, to a publically available, robust, usable application. Transitioning to Python allowed the use of open source frameworks for the workflow core and integration with a javascript graphical workflow interface. DIT is targeted to automatically handle 90% of the data processing for field scientists, modelers, and non-discipline scientists. It is available as an open source tool in GitHub packaged for a subset of Mac, Windows, and UNIX systems as a desktop application with a graphical workflow manager. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets

  5. A Semi-Automated Workflow Solution for Data Set Publication

    DOE PAGES

    Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; ...

    2016-03-08

    In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less

  6. A Semi-Automated Workflow Solution for Data Set Publication

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.

    In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less

  7. Radiology Workflow Dynamics: How Workflow Patterns Impact Radiologist Perceptions of Workplace Satisfaction.

    PubMed

    Lee, Matthew H; Schemmel, Andrew J; Pooler, B Dustin; Hanley, Taylor; Kennedy, Tabassum; Field, Aaron; Wiegmann, Douglas; Yu, John-Paul J

    2017-04-01

    The study aimed to assess perceptions of reading room workflow and the impact separating image-interpretive and nonimage-interpretive task workflows can have on radiologist perceptions of workplace disruptions, workload, and overall satisfaction. A 14-question survey instrument was developed to measure radiologist perceptions of workplace interruptions, satisfaction, and workload prior to and following implementation of separate image-interpretive and nonimage-interpretive reading room workflows. The results were collected over 2 weeks preceding the intervention and 2 weeks following the end of the intervention. The results were anonymized and analyzed using univariate analysis. A total of 18 people responded to the preintervention survey: 6 neuroradiology fellows and 12 attending neuroradiologists. Fifteen people who were then present for the 1-month intervention period responded to the postintervention survey. Perceptions of workplace disruptions, image interpretation, quality of trainee education, ability to perform nonimage-interpretive tasks, and quality of consultations (P < 0.0001) all improved following the intervention. Mental effort and workload also improved across all assessment domains, as did satisfaction with quality of image interpretation and consultative work. Implementation of parallel dedicated image-interpretive and nonimage-interpretive workflows may improve markers of radiologist perceptions of workplace satisfaction. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  8. BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication.

    PubMed

    Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin

    2017-11-10

    The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Biowep: a workflow enactment portal for bioinformatics applications.

    PubMed

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-03-08

    The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of

  10. Biowep: a workflow enactment portal for bioinformatics applications

    PubMed Central

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-01-01

    Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis

  11. Building asynchronous geospatial processing workflows with web services

    NASA Astrophysics Data System (ADS)

    Zhao, Peisheng; Di, Liping; Yu, Genong

    2012-02-01

    Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.

  12. Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization

    PubMed Central

    Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley

    2015-01-01

    Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173

  13. Kwf-Grid workflow management system for Earth science applications

    NASA Astrophysics Data System (ADS)

    Tran, V.; Hluchy, L.

    2009-04-01

    In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.

  14. A microseismic workflow for managing induced seismicity risk as CO 2 storage projects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzel, E.; Morency, C.; Pyle, M.

    2015-10-27

    It is well established that fluid injection has the potential to induce earthquakes—from microseismicity to large, damaging events—by altering state-of-stress conditions in the subsurface. While induced seismicity has not been a major operational issue for carbon storage projects to date, a seismicity hazard exists and must be carefully addressed. Two essential components of effective seismic risk management are (1) sensitive microseismic monitoring and (2) robust data interpretation tools. This report describes a novel workflow, based on advanced processing algorithms applied to microseismic data, to help improve management of seismic risk. This workflow has three main goals: (1) to improve themore » resolution and reliability of passive seismic monitoring, (2) to extract additional, valuable information from continuous waveform data that is often ignored in standard processing, and (3) to minimize the turn-around time between data collection, interpretation, and decision-making. These three objectives can allow for a better-informed and rapid response to changing subsurface conditions.« less

  15. Introducing students to digital geological mapping: A workflow based on cheap hardware and free software

    NASA Astrophysics Data System (ADS)

    Vrabec, Marko; Dolžan, Erazem

    2016-04-01

    The undergraduate field course in Geological Mapping at the University of Ljubljana involves 20-40 students per year, which precludes the use of specialized rugged digital field equipment as the costs would be way beyond the capabilities of the Department. A different mapping area is selected each year with the aim to provide typical conditions that a professional geologist might encounter when doing fieldwork in Slovenia, which includes rugged relief, dense tree cover, and moderately-well- to poorly-exposed bedrock due to vegetation and urbanization. It is therefore mandatory that the digital tools and workflows are combined with classical methods of fieldwork, since, for example, full-time precise GNSS positioning is not viable under such circumstances. Additionally, due to the prevailing combination of complex geological structure with generally poor exposure, students cannot be expected to produce line (vector) maps of geological contacts on the go, so there is no need for such functionality in hardware and software that we use in the field. Our workflow therefore still relies on paper base maps, but is strongly complemented with digital tools to provide robust positioning, track recording, and acquisition of various point-based data. Primary field hardware are students' Android-based smartphones and optionally tablets. For our purposes, the built-in GNSS chips provide adequate positioning precision most of the time, particularly if they are GLONASS-capable. We use Oruxmaps, a powerful free offline map viewer for the Android platform, which facilitates the use of custom-made geopositioned maps. For digital base maps, which we prepare in free Windows QGIS software, we use scanned topographic maps provided by the National Geodetic Authority, but also other maps such as aerial imagery, processed Digital Elevation Models, scans of existing geological maps, etc. Point data, like important outcrop locations or structural measurements, are entered into Oruxmaps as

  16. The equivalency between logic Petri workflow nets and workflow nets.

    PubMed

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented.

  17. The Equivalency between Logic Petri Workflow Nets and Workflow Nets

    PubMed Central

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845

  18. Scientific workflow and support for high resolution global climate modeling at the Oak Ridge Leadership Computing Facility

    NASA Astrophysics Data System (ADS)

    Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.

    2012-04-01

    The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on

  19. CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.

    2013-12-01

    As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over

  20. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support

    PubMed Central

    2012-01-01

    Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system

  1. Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

    PubMed

    Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa

    2012-05-04

    Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a

  2. Integrating text mining into the MGI biocuration workflow

    PubMed Central

    Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  3. Integrating text mining into the MGI biocuration workflow.

    PubMed

    Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  4. Ontology-Driven Discovery of Scientific Computational Entities

    ERIC Educational Resources Information Center

    Brazier, Pearl W.

    2010-01-01

    Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…

  5. Integrating advanced visualization technology into the planetary Geoscience workflow

    NASA Astrophysics Data System (ADS)

    Huffman, John; Forsberg, Andrew; Loomis, Andrew; Head, James; Dickson, James; Fassett, Caleb

    2011-09-01

    Recent advances in computer visualization have allowed us to develop new tools for analyzing the data gathered during planetary missions, which is important, since these data sets have grown exponentially in recent years to tens of terabytes in size. As part of the Advanced Visualization in Solar System Exploration and Research (ADVISER) project, we utilize several advanced visualization techniques created specifically with planetary image data in mind. The Geoviewer application allows real-time active stereo display of images, which in aggregate have billions of pixels. The ADVISER desktop application platform allows fast three-dimensional visualization of planetary images overlain on digital terrain models. Both applications include tools for easy data ingest and real-time analysis in a programmatic manner. Incorporation of these tools into our everyday scientific workflow has proved important for scientific analysis, discussion, and publication, and enabled effective and exciting educational activities for students from high school through graduate school.

  6. Use of apparent thickness for preprocessing of low-frequency electromagnetic data in inversion-based multibarrier evaluation workflow

    NASA Astrophysics Data System (ADS)

    Omar, Saad; Omeragic, Dzevat

    2018-04-01

    The concept of apparent thicknesses is introduced for the inversion-based, multicasing evaluation interpretation workflow using multifrequency and multispacing electromagnetic measurements. A thickness value is assigned to each measurement, enabling the development of two new preprocessing algorithms to remove casing collar artifacts. First, long-spacing apparent thicknesses are used to remove, from the pipe sections, artifacts ("ghosts") caused by the transmitter crossing a casing collar or corrosion. Second, a collar identification, localization, and assignment algorithm is developed to enable robust inversion in collar sections. Last, casing eccentering can also be identified on the basis of opposite deviation of short-spacing phase and magnitude apparent thicknesses from the nominal value. The proposed workflow can handle an arbitrary number of nested casings and has been validated on synthetic and field data.

  7. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

    PubMed

    Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris

    2011-10-20

    Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down

  8. Integrated workflows for spiking neuronal network simulations

    PubMed Central

    Antolík, Ján; Davison, Andrew P.

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID

  9. Integrated workflows for spiking neuronal network simulations.

    PubMed

    Antolík, Ján; Davison, Andrew P

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.

  10. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

    PubMed

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A

  11. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

    PubMed Central

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A

  12. A novel spectral library workflow to enhance protein identifications.

    PubMed

    Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-04-09

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Traversing the many paths of workflow research: developing a conceptual framework of workflow terminology through a systematic literature review

    PubMed Central

    Novak, Laurie L; Johnson, Kevin B; Lorenzi, Nancy M

    2010-01-01

    The objective of this review was to describe methods used to study and model workflow. The authors included studies set in a variety of industries using qualitative, quantitative and mixed methods. Of the 6221 matching abstracts, 127 articles were included in the final corpus. The authors collected data from each article on researcher perspective, study type, methods type, specific methods, approaches to evaluating quality of results, definition of workflow and dependent variables. Ethnographic observation and interviews were the most frequently used methods. Long study durations revealed the large time commitment required for descriptive workflow research. The most frequently discussed technique for evaluating quality of study results was triangulation. The definition of the term “workflow” and choice of methods for studying workflow varied widely across research areas and researcher perspectives. The authors developed a conceptual framework of workflow-related terminology for use in future research and present this model for use by other researchers. PMID:20442143

  14. SADI, SHARE, and the in silico scientific method

    PubMed Central

    2010-01-01

    Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986

  15. Make Your Workflows Smarter

    NASA Technical Reports Server (NTRS)

    Jones, Corey; Kapatos, Dennis; Skradski, Cory

    2012-01-01

    Do you have workflows with many manual tasks that slow down your business? Or, do you scale back workflows because there are simply too many manual tasks? Basic workflow robots can automate some common tasks, but not everything. This presentation will show how advanced robots called "expression robots" can be set up to perform everything from simple tasks such as: moving, creating folders, renaming, changing or creating an attribute, and revising, to more complex tasks like: creating a pdf, or even launching a session of Creo Parametric and performing a specific modeling task. Expression robots are able to utilize the Java API and Info*Engine to do almost anything you can imagine! Best of all, these tools are supported by PTC and will work with later releases of Windchill. Limited knowledge of Java, Info*Engine, and XML are required. The attendee will learn what task expression robots are capable of performing. The attendee will learn what is involved in setting up an expression robot. The attendee will gain a basic understanding of simple Info*Engine tasks

  16. RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service

    NASA Astrophysics Data System (ADS)

    Yang, Chao; Chen, Nengcheng; Di, Liping

    2012-10-01

    Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.

  17. Worklist handling in workflow-enabled radiological application systems

    NASA Astrophysics Data System (ADS)

    Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens

    2000-05-01

    For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.

  18. The complete digital workflow in fixed prosthodontics: a systematic review.

    PubMed

    Joda, Tim; Zarone, Fernando; Ferrari, Marco

    2017-09-19

    generation, allocation concealment, blinding, completeness of outcome data, selective reporting, and other bias using the Cochrane Collaboration tool. A judgment of risk of bias was assigned if one or more key domains had a high or unclear risk of bias. An official registration of the systematic review was not performed. The systematic search identified 67 titles, 32 abstracts thereof were screened, and subsequently, three full-texts included for data extraction. Analysed RCTs were heterogeneous without follow-up. One study demonstrated that fully digitally produced dental crowns revealed the feasibility of the process itself; however, the marginal precision was lower for lithium disilicate (LS2) restorations (113.8 μm) compared to conventional metal-ceramic (92.4 μm) and zirconium dioxide (ZrO2) crowns (68.5 μm) (p < 0.05). Another study showed that leucite-reinforced glass ceramic crowns were esthetically favoured by the patients (8/2 crowns) and clinicians (7/3 crowns) (p < 0.05). The third study investigated implant crowns. The complete digital workflow was more than twofold faster (75.3 min) in comparison to the mixed analog-digital workflow (156.6 min) (p < 0.05). No RCTs could be found investigating multi-unit fixed dental prostheses (FDP). The number of RCTs testing complete digital workflows in fixed prosthodontics is low. Scientifically proven recommendations for clinical routine cannot be given at this time. Research with high-quality trials seems to be slower than the industrial progress of available digital applications. Future research with well-designed RCTs including follow-up observation is compellingly necessary in the field of complete digital processing.

  19. Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

    NASA Astrophysics Data System (ADS)

    Jones, A. S.; Horsburgh, J. S.

    2014-12-01

    Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban

  20. COSMOS: Python library for massively parallel workflows

    PubMed Central

    Gafni, Erik; Luquette, Lovelace J.; Lancaster, Alex K.; Hawkins, Jared B.; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter J.

    2014-01-01

    Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24982428

  1. COSMOS: Python library for massively parallel workflows.

    PubMed

    Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J

    2014-10-15

    Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  2. Modelling and analysis of workflow for lean supply chains

    NASA Astrophysics Data System (ADS)

    Ma, Jinping; Wang, Kanliang; Xu, Lida

    2011-11-01

    Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.

  3. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    PubMed

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  4. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

    PubMed Central

    Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  5. A Model of Workflow Composition for Emergency Management

    NASA Astrophysics Data System (ADS)

    Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu

    The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.

  6. Improving adherence to the Epic Beacon ambulatory workflow.

    PubMed

    Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana

    2017-06-01

    Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.

  7. The standard-based open workflow system in GeoBrain (Invited)

    NASA Astrophysics Data System (ADS)

    Di, L.; Yu, G.; Zhao, P.; Deng, M.

    2013-12-01

    GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial

  8. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta.

    PubMed

    Gostel, Morgan R; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A

    2016-09-01

    Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships.

  9. A WorkFlow Engine Oriented Modeling System for Hydrologic Sciences

    NASA Astrophysics Data System (ADS)

    Lu, B.; Piasecki, M.

    2009-12-01

    In recent years the use of workflow engines for carrying out modeling and data analyses tasks has gained increased attention in the science and engineering communities. Tasks like processing raw data coming from sensors and passing these raw data streams to filters for QA/QC procedures possibly require multiple and complicated steps that need to be repeated over and over again. A workflow sequence that carries out a number of steps of various complexity is an ideal approach to deal with these tasks because the sequence can be stored, called up and repeated over again and again. This has several advantages: for one it ensures repeatability of processing steps and with that provenance, an issue that is increasingly important in the science and engineering communities. It also permits the hand off of lengthy and time consuming tasks that can be error prone to a chain of processing actions that are carried out automatically thus reducing the chance for error on the one side and freeing up time to carry out other tasks on the other hand. This paper aims to present the development of a workflow engine embedded modeling system which allows to build up working sequences for carrying out numerical modeling tasks regarding to hydrologic science. Trident, which facilitates creating, running and sharing scientific data analysis workflows, is taken as the central working engine of the modeling system. Current existing functionalities of the modeling system involve digital watershed processing, online data retrieval, hydrologic simulation and post-event analysis. They are stored as sequences or modules respectively. The sequences can be invoked to implement their preset tasks in orders, for example, triangulating a watershed from raw DEM. Whereas the modules encapsulated certain functions can be selected and connected through a GUI workboard to form sequences. This modeling system is demonstrated by setting up a new sequence for simulating rainfall-runoff processes which

  10. Bioinformatics workflows and web services in systems biology made easy for experimentalists.

    PubMed

    Jimenez, Rafael C; Corpas, Manuel

    2013-01-01

    Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.

  11. A Tool Supporting Collaborative Data Analytics Workflow Design and Management

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Bao, Q.; Lee, T. J.

    2016-12-01

    Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.

  12. Context-aware workflow management of mobile health applications.

    PubMed

    Salden, Alfons; Poortinga, Remco

    2006-01-01

    We propose a medical application management architecture that allows medical (IT) experts readily designing, developing and deploying context-aware mobile health (m-health) applications or services. In particular, we elaborate on how our application workflow management architecture enables chaining, coordinating, composing, and adapting context-sensitive medical application components such that critical Quality of Service (QoS) and Quality of Context (QoC) requirements typical for m-health applications or services can be met. This functional architectural support requires learning modules for distilling application-critical selection of attention and anticipation models. These models will help medical experts constructing and adjusting on-the-fly m-health application workflows and workflow strategies. We illustrate our context-aware workflow management paradigm for a m-health data delivery problem, in which optimal communication network configurations have to be determined.

  13. Impact of digital radiography on clinical workflow.

    PubMed

    May, G A; Deer, D D; Dackiewicz, D

    2000-05-01

    It is commonly accepted that digital radiography (DR) improves workflow and patient throughput compared with traditional film radiography or computed radiography (CR). DR eliminates the film development step and the time to acquire the image from a CR reader. In addition, the wide dynamic range of DR is such that the technologist can perform the quality-control (QC) step directly at the modality in a few seconds, rather than having to transport the newly acquired image to a centralized QC station for review. Furthermore, additional workflow efficiencies can be achieved with DR by employing tight radiology information system (RIS) integration. In the DR imaging environment, this provides for patient demographic information to be automatically downloaded from the RIS to populate the DR Digital Imaging and Communications in Medicine (DICOM) image header. To learn more about this workflow efficiency improvement, we performed a comparative study of workflow steps under three different conditions: traditional film/screen x-ray, DR without RIS integration (ie, manual entry of patient demographics), and DR with RIS integration. This study was performed at the Cleveland Clinic Foundation (Cleveland, OH) using a newly acquired amorphous silicon flat-panel DR system from Canon Medical Systems (Irvine, CA). Our data show that DR without RIS results in substantial workflow savings over traditional film/screen practice. There is an additional 30% reduction in total examination time using DR with RIS integration.

  14. Conventions and workflows for using Situs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wriggers, Willy, E-mail: wriggers@biomachina.org

    2012-04-01

    Recent developments of the Situs software suite for multi-scale modeling are reviewed. Typical workflows and conventions encountered during processing of biophysical data from electron microscopy, tomography or small-angle X-ray scattering are described. Situs is a modular program package for the multi-scale modeling of atomic resolution structures and low-resolution biophysical data from electron microscopy, tomography or small-angle X-ray scattering. This article provides an overview of recent developments in the Situs package, with an emphasis on workflows and conventions that are important for practical applications. The modular design of the programs facilitates scripting in the bash shell that allows specific programs tomore » be combined in creative ways that go beyond the original intent of the developers. Several scripting-enabled functionalities, such as flexible transformations of data type, the use of symmetry constraints or the creation of two-dimensional projection images, are described. The processing of low-resolution biophysical maps in such workflows follows not only first principles but often relies on implicit conventions. Situs conventions related to map formats, resolution, correlation functions and feature detection are reviewed and summarized. The compatibility of the Situs workflow with CCP4 conventions and programs is discussed.« less

  15. An MRM-based workflow for quantifying cardiac mitochondrial protein phosphorylation in murine and human tissue.

    PubMed

    Lam, Maggie P Y; Scruggs, Sarah B; Kim, Tae-Young; Zong, Chenggong; Lau, Edward; Wang, Ding; Ryan, Christopher M; Faull, Kym F; Ping, Peipei

    2012-08-03

    The regulation of mitochondrial function is essential for cardiomyocyte adaptation to cellular stress. While it has long been understood that phosphorylation regulates flux through metabolic pathways, novel phosphorylation sites are continually being discovered in all functionally distinct areas of the mitochondrial proteome. Extracting biologically meaningful information from these phosphorylation sites requires an adaptable, sensitive, specific and robust method for their quantification. Here we report a multiple reaction monitoring-based mass spectrometric workflow for quantifying site-specific phosphorylation of mitochondrial proteins. Specifically, chromatographic and mass spectrometric conditions for 68 transitions derived from 23 murine and human phosphopeptides, and their corresponding unmodified peptides, were optimized. These methods enabled the quantification of endogenous phosphopeptides from the outer mitochondrial membrane protein VDAC, and the inner membrane proteins ANT and ETC complexes I, III and V. The development of this quantitative workflow is a pivotal step for advancing our knowledge and understanding of the regulatory effects of mitochondrial protein phosphorylation in cardiac physiology and pathophysiology. This article is part of a Special Issue entitled: Translational Proteomics. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Standardizing clinical trials workflow representation in UML for international site comparison.

    PubMed

    de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M O; Rodrigues, Maria J; Shah, Jatin; Loures, Marco R; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

    2010-11-09

    With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials

  17. Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison

    PubMed Central

    de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

    2010-01-01

    Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative

  18. Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms

    PubMed Central

    Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

    2017-01-01

    With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237

  19. Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.

    PubMed

    Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

    2014-01-01

    With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.

  20. Design and implementation of workflow engine for service-oriented architecture

    NASA Astrophysics Data System (ADS)

    Peng, Shuqing; Duan, Huining; Chen, Deyun

    2009-04-01

    As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.

  1. Replication and robustness in developmental research.

    PubMed

    Duncan, Greg J; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J

    2014-11-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key results are robust across estimation methods, data sets, and demographic subgroups. This article makes the case for prioritizing both explicit replications and, especially, within-study robustness checks in developmental psychology. It provides evidence on variation in effect sizes in developmental studies and documents strikingly different replication and robustness-checking practices in a sample of journals in developmental psychology and a sister behavioral science-applied economics. Our goal is not to show that any one behavioral science has a monopoly on best practices, but rather to show how journals from a related discipline address vital concerns of replication and generalizability shared by all social and behavioral sciences. We provide recommendations for promoting graduate training in replication and robustness-checking methods and for editorial policies that encourage these practices. Although some of our recommendations may shift the form and substance of developmental research articles, we argue that they would generate considerable scientific benefits for the field. (PsycINFO Database Record (c) 2014 APA, all rights reserved).

  2. SU-F-BRD-05: Robustness of Dose Painting by Numbers in Proton Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montero, A Barragan; Sterpin, E; Lee, J

    Purpose: Proton range uncertainties may cause important dose perturbations within the target volume, especially when steep dose gradients are present as in dose painting. The aim of this study is to assess the robustness against setup and range errors for high heterogeneous dose prescriptions (i.e., dose painting by numbers), delivered by proton pencil beam scanning. Methods: An automatic workflow, based on MATLAB functions, was implemented through scripting in RayStation (RaySearch Laboratories). It performs a gradient-based segmentation of the dose painting volume from 18FDG-PET images (GTVPET), and calculates the dose prescription as a linear function of the FDG-uptake value on eachmore » voxel. The workflow was applied to two patients with head and neck cancer. Robustness against setup and range errors of the conventional PTV margin strategy (prescription dilated by 2.5 mm) versus CTV-based (minimax) robust optimization (2.5 mm setup, 3% range error) was assessed by comparing the prescription with the planned dose for a set of error scenarios. Results: In order to ensure dose coverage above 95% of the prescribed dose in more than 95% of the GTVPET voxels while compensating for the uncertainties, the plans with a PTV generated a high overdose. For the nominal case, up to 35% of the GTVPET received doses 5% beyond prescription. For the worst of the evaluated error scenarios, the volume with 5% overdose increased to 50%. In contrast, for CTV-based plans this 5% overdose was present only in a small fraction of the GTVPET, which ranged from 7% in the nominal case to 15% in the worst of the evaluated scenarios. Conclusion: The use of a PTV leads to non-robust dose distributions with excessive overdose in the painted volume. In contrast, robust optimization yields robust dose distributions with limited overdose. RaySearch Laboratories is sincerely acknowledged for providing us with RayStation treatment planning system and for the support provided.« less

  3. Workflow Dynamics and the Imaging Value Chain: Quantifying the Effect of Designating a Nonimage-Interpretive Task Workflow.

    PubMed

    Lee, Matthew H; Schemmel, Andrew J; Pooler, B Dustin; Hanley, Taylor; Kennedy, Tabassum A; Field, Aaron S; Wiegmann, Douglas; Yu, John-Paul J

    To assess the impact of separate non-image interpretive task and image-interpretive task workflows in an academic neuroradiology practice. A prospective, randomized, observational investigation of a centralized academic neuroradiology reading room was performed. The primary reading room fellow was observed over a one-month period using a time-and-motion methodology, recording frequency and duration of tasks performed. Tasks were categorized into separate image interpretive and non-image interpretive workflows. Post-intervention observation of the primary fellow was repeated following the implementation of a consult assistant responsible for non-image interpretive tasks. Pre- and post-intervention data were compared. Following separation of image-interpretive and non-image interpretive workflows, time spent on image-interpretive tasks by the primary fellow increased from 53.8% to 73.2% while non-image interpretive tasks decreased from 20.4% to 4.4%. Mean time duration of image interpretation nearly doubled, from 05:44 to 11:01 (p = 0.002). Decreases in specific non-image interpretive tasks, including phone calls/paging (2.86/hr versus 0.80/hr), in-room consultations (1.36/hr versus 0.80/hr), and protocoling (0.99/hr versus 0.10/hr), were observed. The consult assistant experienced 29.4 task switching events per hour. Rates of specific non-image interpretive tasks for the CA were 6.41/hr for phone calls/paging, 3.60/hr for in-room consultations, and 3.83/hr for protocoling. Separating responsibilities into NIT and IIT workflows substantially increased image interpretation time and decreased TSEs for the primary fellow. Consolidation of NITs into a separate workflow may allow for more efficient task completion. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. A standard-enabled workflow for synthetic biology.

    PubMed

    Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

    2017-06-15

    A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

  5. A three-level atomicity model for decentralized workflow management systems

    NASA Astrophysics Data System (ADS)

    Ben-Shaul, Israel Z.; Heineman, George T.

    1996-12-01

    A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.

  6. Data processing workflows from low-cost digital survey to various applications: three case studies of Chinese historic architecture

    NASA Astrophysics Data System (ADS)

    Sun, Z.; Cao, Y. K.

    2015-08-01

    The paper focuses on the versatility of data processing workflows ranging from BIM-based survey to structural analysis and reverse modeling. In China nowadays, a large number of historic architecture are in need of restoration, reinforcement and renovation. But the architects are not prepared for the conversion from the booming AEC industry to architectural preservation. As surveyors working with architects in such projects, we have to develop efficient low-cost digital survey workflow robust to various types of architecture, and to process the captured data for architects. Although laser scanning yields high accuracy in architectural heritage documentation and the workflow is quite straightforward, the cost and portability hinder it from being used in projects where budget and efficiency are of prime concern. We integrate Structure from Motion techniques with UAV and total station in data acquisition. The captured data is processed for various purposes illustrated with three case studies: the first one is as-built BIM for a historic building based on registered point clouds according to Ground Control Points; The second one concerns structural analysis for a damaged bridge using Finite Element Analysis software; The last one relates to parametric automated feature extraction from captured point clouds for reverse modeling and fabrication.

  7. CamBAfx: Workflow Design, Implementation and Application for Neuroimaging

    PubMed Central

    Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

    2009-01-01

    CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470

  8. Identifying impact of software dependencies on replicability of biomedical workflows.

    PubMed

    Miksa, Tomasz; Rauber, Andreas; Mina, Eleni

    2016-12-01

    Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. Workflow and Electronic Health Records in Small Medical Practices

    PubMed Central

    Ramaiah, Mala; Subrahmanian, Eswaran; Sriram, Ram D; Lide, Bettijoyce B

    2012-01-01

    This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities. PMID:22737096

  10. An Auto-management Thesis Program WebMIS Based on Workflow

    NASA Astrophysics Data System (ADS)

    Chang, Li; Jie, Shi; Weibo, Zhong

    An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.

  11. Flexible End2End Workflow Automation of Hit-Discovery Research.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Thurow, Kerstin

    2014-08-01

    The article considers a new approach of more complex laboratory automation at the workflow layer. The authors purpose the automation of end2end workflows. The combination of all relevant subprocesses-whether automated or manually performed, independently, and in which organizational unit-results in end2end processes that include all result dependencies. The end2end approach focuses on not only the classical experiments in synthesis or screening, but also on auxiliary processes such as the production and storage of chemicals, cell culturing, and maintenance as well as preparatory activities and analyses of experiments. Furthermore, the connection of control flow and data flow in the same process model leads to reducing of effort of the data transfer between the involved systems, including the necessary data transformations. This end2end laboratory automation can be realized effectively with the modern methods of business process management (BPM). This approach is based on a new standardization of the process-modeling notation Business Process Model and Notation 2.0. In drug discovery, several scientific disciplines act together with manifold modern methods, technologies, and a wide range of automated instruments for the discovery and design of target-based drugs. The article discusses the novel BPM-based automation concept with an implemented example of a high-throughput screening of previously synthesized compound libraries. © 2014 Society for Laboratory Automation and Screening.

  12. Decaf: Decoupled Dataflows for In Situ High-Performance Workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dreher, M.; Peterka, T.

    Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less

  13. Schedule-Aware Workflow Management Systems

    NASA Astrophysics Data System (ADS)

    Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

    Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.

  14. Text mining meets workflow: linking U-Compare with Taverna

    PubMed Central

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  15. iCollections methodology: workflow, results and lessons learned.

    PubMed

    Blagoderov, Vladimir; Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J

    2017-01-01

    The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects - iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project.

  16. Integrating prediction, provenance, and optimization into high energy workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schram, M.; Bansal, V.; Friese, R. D.

    We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.

  17. The Frictionless Data Package: Data Containerization for Automated Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Fils, D.; Kinkade, D.; Saito, M. A.

    2017-12-01

    As cross-disciplinary geoscience research increasingly relies on machines to discover and access data, one of the critical questions facing data repositories is how data and supporting materials should be packaged for consumption. Traditionally, data repositories have relied on a human's involvement throughout discovery and access workflows. This human could assess fitness for purpose by reading loosely coupled, unstructured information from web pages and documentation. In attempts to shorten the time to science and access data resources across may disciplines, expectations for machines to mediate the process of discovery and access is challenging data repository infrastructure. This challenge is to find ways to deliver data and information in ways that enable machines to make better decisions by enabling them to understand the data and metadata of many data types. Additionally, once machines have recommended a data resource as relevant to an investigator's needs, the data resource should be easy to integrate into that investigator's toolkits for analysis and visualization. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) supports NSF-funded OCE and PLR investigators with their project's data management needs. These needs involve a number of varying data types some of which require multiple files with differing formats. Presently, BCO-DMO has described these data types and the important relationships between the type's data files through human-readable documentation on web pages. For machines directly accessing data files from BCO-DMO, this documentation could be overlooked and lead to misinterpreting the data. Instead, BCO-DMO is exploring the idea of data containerization, or packaging data and related information for easier transport, interpretation, and use. In researching the landscape of data containerization, the Frictionlessdata Data Package (http://frictionlessdata.io/) provides a number of valuable advantages over similar

  18. Integration of Earth System Models and Workflow Management under iRODS for the Northeast Regional Earth System Modeling Project

    NASA Astrophysics Data System (ADS)

    Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.

    2012-12-01

    The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.

  19. A Comprehensive Workflow of Mass Spectrometry-Based Untargeted Metabolomics in Cancer Metabolic Biomarker Discovery Using Human Plasma and Urine

    PubMed Central

    Zou, Wei; She, Jianwen; Tolstikov, Vladimir V.

    2013-01-01

    Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC–LC), reversed-phase liquid chromatography (RP–LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow. PMID:24958150

  20. Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics

    NASA Astrophysics Data System (ADS)

    Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.

    2015-12-01

    Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in Net

  1. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

    PubMed Central

    Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

    2016-01-01

    Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships. PMID:27672517

  2. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in

  3. Optimization of tomographic reconstruction workflows on geographically distributed resources

    PubMed Central

    Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can

  4. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in

  5. speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification.

    PubMed

    Beirnaert, Charlie; Meysman, Pieter; Vu, Trung Nghia; Hermans, Nina; Apers, Sandra; Pieters, Luc; Covaci, Adrian; Laukens, Kris

    2018-03-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).

  6. speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification

    PubMed Central

    Pieters, Luc; Covaci, Adrian

    2018-01-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq). PMID:29494588

  7. Workflow Management for Complex HEP Analyses

    NASA Astrophysics Data System (ADS)

    Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.

    2017-10-01

    We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.

  8. Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics

    PubMed Central

    Astakhov, V.; Bandrowski, A.; Gupta, A.; Kulungowski, A.W.; Grethe, J.S.; Bouwer, J.; Molina, T.; Rowley, V.; Penticoff, S.; Terada, M.; Wong, W.; Hakozaki, H.; Kwon, O.; Martone, M.E.; Ellisman, M.

    2016-01-01

    We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions. PMID:28479932

  9. Quantitative workflow based on NN for weighting criteria in landfill suitability mapping

    NASA Astrophysics Data System (ADS)

    Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Alkhasawneh, Mutasem Sh.; Aziz, Hamidi Abdul

    2017-10-01

    Our study aims to introduce a new quantitative workflow that integrates neural networks (NNs) and multi criteria decision analysis (MCDA). Existing MCDA workflows reveal a number of drawbacks, because of the reliance on human knowledge in the weighting stage. Thus, new workflow presented to form suitability maps at the regional scale for solid waste planning based on NNs. A feed-forward neural network employed in the workflow. A total of 34 criteria were pre-processed to establish the input dataset for NN modelling. The final learned network used to acquire the weights of the criteria. Accuracies of 95.2% and 93.2% achieved for the training dataset and testing dataset, respectively. The workflow was found to be capable of reducing human interference to generate highly reliable maps. The proposed workflow reveals the applicability of NN in generating landfill suitability maps and the feasibility of integrating them with existing MCDA workflows.

  10. Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

    NASA Astrophysics Data System (ADS)

    Nguyen, L.; Chee, T.; Minnis, P.

    2013-12-01

    Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.

  11. Using CyberShake Workflows to Manage Big Seismic Hazard Data on Large-Scale Open-Science HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2015-12-01

    The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and

  12. Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Yier

    As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less

  13. A novel image processing workflow for the in vivo quantification of skin microvasculature using dynamic optical coherence tomography.

    PubMed

    Zugaj, D; Chenet, A; Petit, L; Vaglio, J; Pascual, T; Piketty, C; Bourdes, V

    2018-02-04

    Currently, imaging technologies that can accurately assess or provide surrogate markers of the human cutaneous microvessel network are limited. Dynamic optical coherence tomography (D-OCT) allows the detection of blood flow in vivo and visualization of the skin microvasculature. However, image processing is necessary to correct images, filter artifacts, and exclude irrelevant signals. The objective of this study was to develop a novel image processing workflow to enhance the technical capabilities of D-OCT. Single-center, vehicle-controlled study including healthy volunteers aged 18-50 years. A capsaicin solution was applied topically on the subject's forearm to induce local inflammation. Measurements of capsaicin-induced increase in dermal blood flow, within the region of interest, were performed by laser Doppler imaging (LDI) (reference method) and D-OCT. Sixteen subjects were enrolled. A good correlation was shown between D-OCT and LDI, using the image processing workflow. Therefore, D-OCT offers an easy-to-use alternative to LDI, with good repeatability, new robust morphological features (dermal-epidermal junction localization), and quantification of the distribution of vessel size and changes in this distribution induced by capsaicin. The visualization of the vessel network was improved through bloc filtering and artifact removal. Moreover, the assessment of vessel size distribution allows a fine analysis of the vascular patterns. The newly developed image processing workflow enhances the technical capabilities of D-OCT for the accurate detection and characterization of microcirculation in the skin. A direct clinical application of this image processing workflow is the quantification of the effect of topical treatment on skin vascularization. © 2018 The Authors. Skin Research and Technology Published by John Wiley & Sons Ltd.

  14. Optimizing high performance computing workflow for protein functional annotation

    PubMed Central

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-01-01

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  15. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  16. Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform.

    PubMed

    Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N

    2017-03-01

    High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.

  17. Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

    PubMed Central

    Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.

    2016-01-01

    High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692

  18. The impact of electronic medical record systems on outpatient workflows: a longitudinal evaluation of its workflow effects.

    PubMed

    Vishwanath, Arun; Singh, Sandeep Rajan; Winkelstein, Peter

    2010-11-01

    The promise of the electronic medical record (EMR) lies in its ability to reduce the costs of health care delivery and improve the overall quality of care--a promise that is realized through major changes in workflows within the health care organization. Yet little systematic information exists about the workflow effects of EMRs. Moreover, some of the research to-date points to reduced satisfaction among physicians after implementation of the EMR and increased time, i.e., negative workflow effects. A better understanding of the impact of the EMR on workflows is, hence, vital to understanding what the technology really does offer that is new and unique. (i) To empirically develop a physician centric conceptual model of the workflow effects of EMRs; (ii) To use the model to understand the antecedents to the physicians' workflow expectation from the new EMR; (iii) To track physicians' satisfaction overtime, 3 months and 20 months after implementation of the EMR; (iv) To explore the impact of technology learning curves on physicians' reported satisfaction levels. The current research uses the mixed-method technique of concept mapping to empirically develop the conceptual model of an EMR's workflow effects. The model is then used within a controlled study to track physician expectations from a new EMR system as well as their assessments of the EMR's performance 3 months and 20 months after implementation. The research tracks the actual implementation of a new EMR within the outpatient clinics of a large northeastern research hospital. The pre-implementation survey netted 20 physician responses; post-implementation Time 1 survey netted 22 responses, and Time 2 survey netted 26 physician responses. The implementation of the actual EMR served as the intervention. Since the study was conducted within the same setting and tracked a homogenous group of respondents, the overall study design ensured against extraneous influences on the results. Outcome measures were derived

  19. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

    PubMed Central

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

    2015-01-01

    Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831

  20. iCollections methodology: workflow, results and lessons learned

    PubMed Central

    Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J.; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Scialabba, Elisabetta; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J.

    2017-01-01

    Abstract The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project. PMID:29104442

  1. iCollections methodology: workflow, results and lessons learned

    PubMed Central

    Penn, Malcolm; Sadka, Mike; Hine, Adrian; Brooks, Stephen; Siebert, Darrell J.; Sleep, Chris; Cafferty, Steve; Cane, Elisa; Martin, Geoff; Toloni, Flavia; Wing, Peter; Chainey, John; Duffell, Liz; Huxley, Rob; Ledger, Sophie; McLaughlin, Caitlin; Mazzetta, Gerardo; Perera, Jasmin; Crowther, Robyn; Douglas, Lyndsey; Durant, Joanna; Honey, Martin; Huertas, Blanca; Howard, Theresa; Carter, Victoria; Albuquerque, Sara; Paterson, Gordon; Kitching, Ian J.

    2017-01-01

    Abstract The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project. PMID:29104435

  2. Text mining for the biocuration workflow.

    PubMed

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  3. Scientific rigor through videogames.

    PubMed

    Treuille, Adrien; Das, Rhiju

    2014-11-01

    Hypothesis-driven experimentation - the scientific method - can be subverted by fraud, irreproducibility, and lack of rigorous predictive tests. A robust solution to these problems may be the 'massive open laboratory' model, recently embodied in the internet-scale videogame EteRNA. Deploying similar platforms throughout biology could enforce the scientific method more broadly. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment

    PubMed Central

    Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan

    2016-01-01

    Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058

  5. The Scientific Filesystem

    PubMed Central

    Sochat, Vanessa

    2018-01-01

    Abstract Background Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules (“apps”) that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves. Results We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists’ work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from

  6. The Scientific Filesystem.

    PubMed

    Sochat, Vanessa

    2018-05-01

    Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves. We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and

  7. A Workflow to Investigate Exposure and Pharmacokinetic ...

    EPA Pesticide Factsheets

    Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowledge on exposure, along with absorption, distribution, metabolism, and excretion (ADME) properties of chemicals.Objectives: We developed a conceptual workflow to examine exposure and ADME properties in relation to an MIE. The utility of this workflow was evaluated using a previously established AOP, acetylcholinesterase (AChE) inhibition.Methods: Thirty chemicals found to inhibit human AChE in the ToxCast™ assay were examined with respect to their exposure, absorption potential, and ability to cross the blood–brain barrier (BBB). Structures of active chemicals were compared against structures of 1,029 inactive chemicals to detect possible parent compounds that might have active metabolites.Results: Application of the workflow screened 10 “low-priority” chemicals of 30 active chemicals. Fifty-two of the 1,029 inactive chemicals exhibited a similarity threshold of ≥ 75% with their nearest active neighbors. Of these 52 compounds, 30 were excluded due to poor absorption or distribution. The remaining 22 compounds may inhibit AChE in vivo either directly or as a result of metabolic activation.Conclusions: The incorporation of exposure and ADME properties into the conceptual workflow e

  8. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

    PubMed

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

    2015-05-01

    The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.

  9. A Workflow to Improve the Alignment of Prostate Imaging with Whole-mount Histopathology.

    PubMed

    Yamamoto, Hidekazu; Nir, Dror; Vyas, Lona; Chang, Richard T; Popert, Rick; Cahill, Declan; Challacombe, Ben; Dasgupta, Prokar; Chandra, Ashish

    2014-08-01

    Evaluation of prostate imaging tests against whole-mount histology specimens requires accurate alignment between radiologic and histologic data sets. Misalignment results in false-positive and -negative zones as assessed by imaging. We describe a workflow for three-dimensional alignment of prostate imaging data against whole-mount prostatectomy reference specimens and assess its performance against a standard workflow. Ethical approval was granted. Patients underwent motorized transrectal ultrasound (Prostate Histoscanning) to generate a three-dimensional image of the prostate before radical prostatectomy. The test workflow incorporated steps for axial alignment between imaging and histology, size adjustments following formalin fixation, and use of custom-made parallel cutters and digital caliper instruments. The control workflow comprised freehand cutting and assumed homogeneous block thicknesses at the same relative angles between pathology and imaging sections. Thirty radical prostatectomy specimens were histologically and radiologically processed, either by an alignment-optimized workflow (n = 20) or a control workflow (n = 10). The optimized workflow generated tissue blocks of heterogeneous thicknesses but with no significant drifting in the cutting plane. The control workflow resulted in significantly nonparallel blocks, accurately matching only one out of four histology blocks to their respective imaging data. The image-to-histology alignment accuracy was 20% greater in the optimized workflow (P < .0001), with higher sensitivity (85% vs. 69%) and specificity (94% vs. 73%) for margin prediction in a 5 × 5-mm grid analysis. A significantly better alignment was observed in the optimized workflow. Evaluation of prostate imaging biomarkers using whole-mount histology references should include a test-to-reference spatial alignment workflow. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.

  10. Combining the AFLOW GIBBS and elastic libraries to efficiently and robustly screen thermomechanical properties of solids

    NASA Astrophysics Data System (ADS)

    Toher, Cormac; Oses, Corey; Plata, Jose J.; Hicks, David; Rose, Frisco; Levy, Ohad; de Jong, Maarten; Asta, Mark; Fornari, Marco; Buongiorno Nardelli, Marco; Curtarolo, Stefano

    2017-06-01

    Thorough characterization of the thermomechanical properties of materials requires difficult and time-consuming experiments. This severely limits the availability of data and is one of the main obstacles for the development of effective accelerated materials design strategies. The rapid screening of new potential materials requires highly integrated, sophisticated, and robust computational approaches. We tackled the challenge by developing an automated, integrated workflow with robust error-correction within the AFLOW framework which combines the newly developed "Automatic Elasticity Library" with the previously implemented GIBBS method. The first extracts the mechanical properties from automatic self-consistent stress-strain calculations, while the latter employs those mechanical properties to evaluate the thermodynamics within the Debye model. This new thermoelastic workflow is benchmarked against a set of 74 experimentally characterized systems to pinpoint a robust computational methodology for the evaluation of bulk and shear moduli, Poisson ratios, Debye temperatures, Grüneisen parameters, and thermal conductivities of a wide variety of materials. The effect of different choices of equations of state and exchange-correlation functionals is examined and the optimum combination of properties for the Leibfried-Schlömann prediction of thermal conductivity is identified, leading to improved agreement with experimental results than the GIBBS-only approach. The framework has been applied to the AFLOW.org data repositories to compute the thermoelastic properties of over 3500 unique materials. The results are now available online by using an expanded version of the REST-API described in the Appendix.

  11. High-volume workflow management in the ITN/FBI system

    NASA Astrophysics Data System (ADS)

    Paulson, Thomas L.

    1997-02-01

    The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.

  12. A Framework for Modeling Workflow Execution by an Interdisciplinary Healthcare Team.

    PubMed

    Kezadri-Hamiaz, Mounira; Rosu, Daniela; Wilk, Szymon; Kuziemsky, Craig; Michalowski, Wojtek; Carrier, Marc

    2015-01-01

    The use of business workflow models in healthcare is limited because of insufficient capture of complexities associated with behavior of interdisciplinary healthcare teams that execute healthcare workflows. In this paper we present a novel framework that builds on the well-founded business workflow model formalism and related infrastructures and introduces a formal semantic layer that describes selected aspects of team dynamics and supports their real-time operationalization.

  13. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  14. Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

    NASA Astrophysics Data System (ADS)

    Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

    2008-12-01

    The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with

  15. Development of the workflow kine systems for support on KAIZEN.

    PubMed

    Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro

    2012-01-01

    In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.

  16. [Integration of the radiotherapy irradiation planning in the digital workflow].

    PubMed

    Röhner, F; Schmucker, M; Henne, K; Momm, F; Bruggmoser, G; Grosu, A-L; Frommhold, H; Heinemann, F E

    2013-02-01

    At the Clinic of Radiotherapy at the University Hospital Freiburg, all relevant workflow is paperless. After implementing the Operating Schedule System (OSS) as a framework, all processes are being implemented into the departmental system MOSAIQ. Designing a digital workflow for radiotherapy irradiation planning is a large challenge, it requires interdisciplinary expertise and therefore the interfaces between the professions also have to be interdisciplinary. For every single step of radiotherapy irradiation planning, distinct responsibilities have to be defined and documented. All aspects of digital storage, backup and long-term availability of data were considered and have already been realized during the OSS project. After an analysis of the complete workflow and the statutory requirements, a detailed project plan was designed. In an interdisciplinary workgroup, problems were discussed and a detailed flowchart was developed. The new functionalities were implemented in a testing environment by the Clinical and Administrative IT Department (CAI). After extensive tests they were integrated into the new modular department system. The Clinic of Radiotherapy succeeded in realizing a completely digital workflow for radiotherapy irradiation planning. During the testing phase, our digital workflow was examined and afterwards was approved by the responsible authority.

  17. Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

    PubMed

    Hartman, Douglas J

    2015-06-01

    Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.

    PubMed

    Hartman, Douglas J

    2016-03-01

    Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

    PubMed Central

    Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  20. Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

    PubMed

    Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

    2012-01-01

    Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.

  1. The Nasa-Isro SAR Mission Science Data Products and Processing Workflows

    NASA Astrophysics Data System (ADS)

    Rosen, P. A.; Agram, P. S.; Lavalle, M.; Cohen, J.; Buckley, S.; Kumar, R.; Misra-Ray, A.; Ramanujam, V.; Agarwal, K. M.

    2017-12-01

    The NASA-ISRO SAR (NISAR) Mission is currently in the development phase and in the process of specifying its suite of data products and algorithmic workflows, responding to inputs from the NISAR Science and Applications Team. NISAR will provide raw data (Level 0), full-resolution complex imagery (Level 1), and interferometric and polarimetric image products (Level 2) for the entire data set, in both natural radar and geocoded coordinates. NASA and ISRO are coordinating the formats, meta-data layers, and algorithms for these products, for both the NASA-provided L-band radar and the ISRO-provided S-band radar. Higher level products will be also be generated for the purpose of calibration and validation, over large areas of Earth, including tectonic plate boundaries, ice sheets and sea-ice, and areas of ecosystem disturbance and change. This level of comprehensive product generation has been unprecedented for SAR missions in the past, and leads to storage processing challenges for the production system and the archive center. Further, recognizing the potential to support applications that require low latency product generation and delivery, the NISAR team is optimizing the entire end-to-end ground data system for such response, including exploring the advantages of cloud-based processing, algorithmic acceleration using GPUs, and on-demand processing schemes that minimize computational and transport costs, but allow rapid delivery to science and applications users. This paper will review the current products, workflows, and discuss the scientific and operational trade-space of mission capabilities.

  2. Processes in scientific workflows for information seeking related to physical sample materials

    NASA Astrophysics Data System (ADS)

    Ramdeen, S.

    2014-12-01

    The majority of State Geological Surveys have repositories containing cores, cuttings, fossils or other physical sample material. State surveys maintain these collections to support their own research as well as the research conducted by external users from other organizations. This includes organizations such as government agencies (state and federal), academia, industry and the public. The preliminary results presented in this paper will look at the research processes of these external users. In particular: how they discover, access and use digital surrogates, which they use to evaluate and access physical items in these collections. Data such as physical samples are materials that cannot be completely replaced with digital surrogates. Digital surrogates may be represented as metadata, which enable discovery and ultimately access to these samples. These surrogates may be found in records, databases, publications, etc. But surrogates do not completely prevent the need for access to the physical item as they cannot be subjected to chemical testing and/or other similar analysis. The goal of this research is to document the various processes external users perform in order to access physical materials. Data for this study will be collected by conducting interviews with these external users. During the interviews, participants will be asked to describe the workflow that lead them to interact with state survey repositories, and what steps they took afterward. High level processes/categories of behavior will be identified. These processes will be used in the development of an information seeking behavior model. This model may be used to facilitate the development of management tools and other aspects of cyberinfrastructure related to physical samples.

  3. Deployment of precise and robust sensors on board ISS-for scientific experiments and for operation of the station.

    PubMed

    Stenzel, Christian

    2016-09-01

    The International Space Station (ISS) is the largest technical vehicle ever built by mankind. It provides a living area for six astronauts and also represents a laboratory in which scientific experiments are conducted in an extraordinary environment. The deployed sensor technology contributes significantly to the operational and scientific success of the station. The sensors on board the ISS can be thereby classified into two categories which differ significantly in their key features: (1) sensors related to crew and station health, and (2) sensors to provide specific measurements in research facilities. The operation of the station requires robust, long-term stable and reliable sensors, since they assure the survival of the astronauts and the intactness of the station. Recently, a wireless sensor network for measuring environmental parameters like temperature, pressure, and humidity was established and its function could be successfully verified over several months. Such a network enhances the operational reliability and stability for monitoring these critical parameters compared to single sensors. The sensors which are implemented into the research facilities have to fulfil other objectives. The high performance of the scientific experiments that are conducted in different research facilities on-board demands the perfect embedding of the sensor in the respective instrumental setup which forms the complete measurement chain. It is shown that the performance of the single sensor alone does not determine the success of the measurement task; moreover, the synergy between different sensors and actuators as well as appropriate sample taking, followed by an appropriate sample preparation play an essential role. The application in a space environment adds additional challenges to the sensor technology, for example the necessity for miniaturisation, automation, reliability, and long-term operation. An alternative is the repetitive calibration of the sensors. This approach

  4. Exploring the impact of an automated prescription-filling device on community pharmacy technician workflow.

    PubMed

    Walsh, Kristin E; Chui, Michelle Anne; Kieser, Mara A; Williams, Staci M; Sutter, Susan L; Sutter, John G

    2011-01-01

    To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign.

  5. Exploring the impact of an automated prescription-filling device on community pharmacy technician workflow

    PubMed Central

    Walsh, Kristin E.; Chui, Michelle Anne; Kieser, Mara A.; Williams, Staci M.; Sutter, Susan L.; Sutter, John G.

    2012-01-01

    Objective To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. Methods At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Results Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Conclusion Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign. PMID:21896459

  6. Flexible Early Warning Systems with Workflows and Decision Tables

    NASA Astrophysics Data System (ADS)

    Riedel, F.; Chaves, F.; Zeiner, H.

    2012-04-01

    An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows

  7. Considering Time in Orthophotography Production: from a General Workflow to a Shortened Workflow for a Faster Disaster Response

    NASA Astrophysics Data System (ADS)

    Lucas, G.

    2015-08-01

    This article overall deals with production time with orthophoto imagery with medium size digital frame camera. The workflow examination follows two main parts: data acquisition and post-processing. The objectives of the research are fourfold: 1/ gathering time references for the most important steps of orthophoto production (it turned out that literature is missing on this topic); these figures are used later for total production time estimation; 2/ identifying levers for reducing orthophoto production time; 3/ building a simplified production workflow for emergency response: less exigent with accuracy and faster; and compare it to a classical workflow; 4/ providing methodical elements for the estimation of production time with a custom project. In the data acquisition part a comprehensive review lists and describes all the factors that may affect the acquisition efficiency. Using a simulation with different variables (average line length, time of the turns, flight speed) their effect on acquisition efficiency is quantitatively examined. Regarding post-processing, the time references figures were collected from the processing of a 1000 frames case study with 15 cm GSD covering a rectangular area of 447 km2; the time required to achieve each step during the production is written down. When several technical options are possible, each one is tested and time documented so as all alternatives are available. Based on a technical choice with the workflow and using the compiled time reference of the elementary steps, a total time is calculated for the post-processing of the 1000 frames. Two scenarios are compared as regards to time and accuracy. The first one follows the "normal" practices, comprising triangulation, orthorectification and advanced mosaicking methods (feature detection, seam line editing and seam applicator); the second is simplified and make compromise over positional accuracy (using direct geo-referencing) and seamlines preparation in order to achieve

  8. Planning bioinformatics workflows using an expert system

    PubMed Central

    Chen, Xiaoling; Chang, Jeffrey T.

    2017-01-01

    Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928

  9. Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach

    PubMed Central

    Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

    2012-01-01

    Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881

  10. Structuring clinical workflows for diabetes care: an overview of the OntoHealth approach.

    PubMed

    Schweitzer, M; Lasierra, N; Oberbichler, S; Toma, I; Fensel, A; Hoerbst, A

    2014-01-01

    Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view.

  11. Research and Implementation of Key Technologies in Multi-Agent System to Support Distributed Workflow

    NASA Astrophysics Data System (ADS)

    Pan, Tianheng

    2018-01-01

    In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.

  12. myExperiment: a repository and social network for the sharing of bioinformatics workflows

    PubMed Central

    Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David

    2010-01-01

    myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org. PMID:20501605

  13. Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

    PubMed Central

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

  14. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    PubMed

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.

  15. Workflow interruptions, cognitive failure and near-accidents in health care.

    PubMed

    Elfering, Achim; Grebner, Simone; Ebener, Corinne

    2015-01-01

    Errors are frequent in health care. A specific model was tested that affirms failure in cognitive action regulation to mediate the influence of nurses' workflow interruptions and safety conscientiousness on near-accidents in health care. One hundred and sixty-five nurses from seven Swiss hospitals participated in a questionnaire survey. Structural equation modelling confirmed the hypothesised mediation model. Cognitive failure in action regulation significantly mediated the influence of workflow interruptions on near-accidents (p < .05). An indirect path from conscientiousness to near-accidents via cognitive failure in action regulation was also significant (p < .05). Compliance with safety regulations was significantly related to cognitive failure and near-accidents; moreover, cognitive failure mediated the association between compliance and near-accidents (p < .05). Contrary to expectations, compliance with safety regulations was not related to workflow interruptions. Workflow interruptions caused by colleagues, patients and organisational constraints are likely to trigger errors in nursing. Work redesign is recommended to reduce cognitive failure and improve safety of nurses and patients.

  16. Workflow computing. Improving management and efficiency of pathology diagnostic services.

    PubMed

    Buffone, G J; Moreau, D; Beck, J R

    1996-04-01

    Traditionally, information technology in health care has helped practitioners to collect, store, and present information and also to add a degree of automation to simple tasks (instrument interfaces supporting result entry, for example). Thus commercially available information systems do little to support the need to model, execute, monitor, coordinate, and revise the various complex clinical processes required to support health-care delivery. Workflow computing, which is already implemented and improving the efficiency of operations in several nonmedical industries, can address the need to manage complex clinical processes. Workflow computing not only provides a means to define and manage the events, roles, and information integral to health-care delivery but also supports the explicit implementation of policy or rules appropriate to the process. This article explains how workflow computing may be applied to health-care and the inherent advantages of the technology, and it defines workflow system requirements for use in health-care delivery with special reference to diagnostic pathology.

  17. Digitization workflows for flat sheets and packets of plants, algae, and fungi1

    PubMed Central

    Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.

    2015-01-01

    Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256

  18. Workflow oriented software support for image guided radiofrequency ablation of focal liver malignancies

    NASA Astrophysics Data System (ADS)

    Weihusen, Andreas; Ritter, Felix; Kröger, Tim; Preusser, Tobias; Zidowitz, Stephan; Peitgen, Heinz-Otto

    2007-03-01

    Image guided radiofrequency (RF) ablation has taken a significant part in the clinical routine as a minimally invasive method for the treatment of focal liver malignancies. Medical imaging is used in all parts of the clinical workflow of an RF ablation, incorporating treatment planning, interventional targeting and result assessment. This paper describes a software application, which has been designed to support the RF ablation workflow under consideration of the requirements of clinical routine, such as easy user interaction and a high degree of robust and fast automatic procedures, in order to keep the physician from spending too much time at the computer. The application therefore provides a collection of specialized image processing and visualization methods for treatment planning and result assessment. The algorithms are adapted to CT as well as to MR imaging. The planning support contains semi-automatic methods for the segmentation of liver tumors and the surrounding vascular system as well as an interactive virtual positioning of RF applicators and a concluding numerical estimation of the achievable heat distribution. The assessment of the ablation result is supported by the segmentation of the coagulative necrosis and an interactive registration of pre- and post-interventional image data for the comparison of tumor and necrosis segmentation masks. An automatic quantification of surface distances is performed to verify the embedding of the tumor area into the thermal lesion area. The visualization methods support representations in the commonly used orthogonal 2D view as well as in 3D scenes.

  19. Design and implementation of a secure workflow system based on PKI/PMI

    NASA Astrophysics Data System (ADS)

    Yan, Kai; Jiang, Chao-hui

    2013-03-01

    As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.

  20. Solutions for Mining Distributed Scientific Data

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.

    2007-12-01

    Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed

  1. Analog to digital workflow improvement: a quantitative study.

    PubMed

    Wideman, Catherine; Gallet, Jacqueline

    2006-01-01

    This study tracked a radiology department's conversion from utilization of a Kodak Amber analog system to a Kodak DirectView DR 5100 digital system. Through the use of ProModel Optimization Suite, a workflow simulation software package, significant quantitative information was derived from workflow process data measured before and after the change to a digital system. Once the digital room was fully operational and the radiology staff comfortable with the new system, average patient examination time was reduced from 9.24 to 5.28 min, indicating that a higher patient throughput could be achieved. Compared to the analog system, chest examination time for modality specific activities was reduced by 43%. The percentage of repeat examinations experienced with the digital system also decreased to 8% vs. the level of 9.5% experienced with the analog system. The study indicated that it is possible to quantitatively study clinical workflow and productivity by using commercially available software.

  2. From chart tracking to workflow management.

    PubMed Central

    Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.

    1994-01-01

    The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051

  3. An MRM-based workflow for absolute quantitation of lysine-acetylated metabolic enzymes in mouse liver.

    PubMed

    Xu, Leilei; Wang, Fang; Xu, Ying; Wang, Yi; Zhang, Cuiping; Qin, Xue; Yu, Hongxiu; Yang, Pengyuan

    2015-12-07

    As a key post-translational modification mechanism, protein acetylation plays critical roles in regulating and/or coordinating cell metabolism. Acetylation is a prevalent modification process in enzymes. Protein acetylation modification occurs in sub-stoichiometric amounts; therefore extracting biologically meaningful information from these acetylation sites requires an adaptable, sensitive, specific, and robust method for their quantification. In this work, we combine immunoassays and multiple reaction monitoring-mass spectrometry (MRM-MS) technology to develop an absolute quantification for acetylation modification. With this hybrid method, we quantified the acetylation level of metabolic enzymes, which could demonstrate the regulatory mechanisms of the studied enzymes. The development of this quantitative workflow is a pivotal step for advancing our knowledge and understanding of the regulatory effects of protein acetylation in physiology and pathophysiology.

  4. Workflow in clinical trial sites & its association with near miss events for data quality: ethnographic, workflow & systems simulation.

    PubMed

    de Carvalho, Elias Cesar Araujo; Batilana, Adelia Portero; Claudino, Wederson; Reis, Luiz Fernando Lima; Schmerling, Rafael A; Shah, Jatin; Pietrobon, Ricardo

    2012-01-01

    With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage professionals to reduce near miss events and save time/cost. Clinical trial

  5. Workflow in Clinical Trial Sites & Its Association with Near Miss Events for Data Quality: Ethnographic, Workflow & Systems Simulation

    PubMed Central

    Araujo de Carvalho, Elias Cesar; Batilana, Adelia Portero; Claudino, Wederson; Lima Reis, Luiz Fernando; Schmerling, Rafael A.; Shah, Jatin; Pietrobon, Ricardo

    2012-01-01

    Background With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Methodology/Principal Findings Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Conclusions/Significance Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage

  6. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Developing a Collection of Composable Data Translation Software Units to Improve Efficiency and Reproducibility in Ecohydrologic Modeling Workflows

    NASA Astrophysics Data System (ADS)

    Olschanowsky, C.; Flores, A. N.; FitzGerald, K.; Masarik, M. T.; Rudisill, W. J.; Aguayo, M.

    2017-12-01

    Dynamic models of the spatiotemporal evolution of water, energy, and nutrient cycling are important tools to assess impacts of climate and other environmental changes on ecohydrologic systems. These models require spatiotemporally varying environmental forcings like precipitation, temperature, humidity, windspeed, and solar radiation. These input data originate from a variety of sources, including global and regional weather and climate models, global and regional reanalysis products, and geostatistically interpolated surface observations. Data translation measures, often subsetting in space and/or time and transforming and converting variable units, represent a seemingly mundane, but critical step in the application workflows. Translation steps can introduce errors, misrepresentations of data, slow execution time, and interrupt data provenance. We leverage a workflow that subsets a large regional dataset derived from the Weather Research and Forecasting (WRF) model and prepares inputs to the Parflow integrated hydrologic model to demonstrate the impact translation tool software quality on scientific workflow results and performance. We propose that such workflows will benefit from a community approved collection of data transformation components. The components should be self-contained composable units of code. This design pattern enables automated parallelization and software verification, improving performance and reliability. Ensuring that individual translation components are self-contained and target minute tasks increases reliability. The small code size of each component enables effective unit and regression testing. The components can be automatically composed for efficient execution. An efficient data translation framework should be written to minimize data movement. Composing components within a single streaming process reduces data movement. Each component will typically have a low arithmetic intensity, meaning that it requires about the same number of

  8. Replication and Robustness in Developmental Research

    ERIC Educational Resources Information Center

    Duncan, Greg J.; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J.

    2014-01-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key…

  9. Extension of specification language for soundness and completeness of service workflow

    NASA Astrophysics Data System (ADS)

    Viriyasitavat, Wattana; Xu, Li Da; Bi, Zhuming; Sapsomboon, Assadaporn

    2018-05-01

    A Service Workflow is an aggregation of distributed services to fulfill specific functionalities. With ever increasing available services, the methodologies for the selections of the services against the given requirements become main research subjects in multiple disciplines. A few of researchers have contributed to the formal specification languages and the methods for model checking; however, existing methods have the difficulties to tackle with the complexity of workflow compositions. In this paper, we propose to formalize the specification language to reduce the complexity of the workflow composition. To this end, we extend a specification language with the consideration of formal logic, so that some effective theorems can be derived for the verification of syntax, semantics, and inference rules in the workflow composition. The logic-based approach automates compliance checking effectively. The Service Workflow Specification (SWSpec) has been extended and formulated, and the soundness, completeness, and consistency of SWSpec applications have been verified; note that a logic-based SWSpec is mandatory for the development of model checking. The application of the proposed SWSpec has been demonstrated by the examples with the addressed soundness, completeness, and consistency.

  10. Embedding Scientific Integrity and Ethics into the Scientific Process and Research Data Lifecycle

    NASA Astrophysics Data System (ADS)

    Gundersen, L. C.

    2016-12-01

    Predicting climate change, developing resources sustainably, and mitigating natural hazard risk are complex interdisciplinary challenges in the geosciences that require the integration of data and knowledge from disparate disciplines and scales. This kind of interdisciplinary science can only thrive if scientific communities work together and adhere to common standards of scientific integrity, ethics, data management, curation, and sharing. Science and data without integrity and ethics can erode the very fabric of the scientific enterprise and potentially harm society and the planet. Inaccurate risk analyses of natural hazards can lead to poor choices in construction, insurance, and emergency response. Incorrect assessment of mineral resources can bankrupt a company, destroy a local economy, and contaminate an ecosystem. This paper presents key ethics and integrity questions paired with the major components of the research data life cycle. The questions can be used by the researcher during the scientific process to help ensure the integrity and ethics of their research and adherence to sound data management practice. Questions include considerations for open, collaborative science, which is fundamentally changing the responsibility of scientists regarding data sharing and reproducibility. The publication of primary data, methods, models, software, and workflows must become a norm of science. There are also questions that prompt the scientist to think about the benefit of their work to society; ensuring equity, respect, and fairness in working with others; and always striving for honesty, excellence, and transparency.

  11. INA-Rxiv: The Missing Puzzle in Indonesia’s Scientific Publishing Workflow

    NASA Astrophysics Data System (ADS)

    Rahim, R.; Irawan, D. E.; Zulfikar, A.; Hardi, R.; Arliman S, L.; Gultom, E. R.; Ginting, G.; Wahyuni, S. S.; Mesran, M.; Mahjudin, M.; Saputra, I.; Waruwu, F. T.; Suginam, S.; Buulolo, E.; Abraham, J.

    2018-04-01

    INA-Rxiv is the first Indonesia preprint server marking the new development initiated by the open science community. This study aimed at describing the development of INA-Rxiv and its conversations. It usedanalyzer of Inarxiv.id, WhatsApp Group Analyzer, and Twitter Analytics as the tools for data analysis complemented with observation.The results showed that INA-Rxiv users are growing because of the numerous discussions in social media, e.g.WhatsApp,as well as some other positive response of writers who have been using INA- Rxiv. The perspective of growth mindset and the implication of INA-Rxiv movement for filling up the gap in accelerating scientific dissemination process are presented at the end of this article.

  12. Information Issues and Contexts that Impair Team Based Communication Workflow: A Palliative Sedation Case Study.

    PubMed

    Cornett, Alex; Kuziemsky, Craig

    2015-01-01

    Implementing team based workflows can be complex because of the scope of providers involved and the extent of information exchange and communication that needs to occur. While a workflow may represent the ideal structure of communication that needs to occur, information issues and contextual factors may impact how the workflow is implemented in practice. Understanding these issues will help us better design systems to support team based workflows. In this paper we use a case study of palliative sedation therapy (PST) to model a PST workflow and then use it to identify purposes of communication, information issues and contextual factors that impact them. We then suggest how our findings could inform health information technology (HIT) design to support team based communication workflows.

  13. A framework for service enterprise workflow simulation with multi-agents cooperation

    NASA Astrophysics Data System (ADS)

    Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun

    2013-11-01

    Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.

  14. Robust automated mass spectra interpretation and chemical formula calculation using mixed integer linear programming.

    PubMed

    Baran, Richard; Northen, Trent R

    2013-10-15

    Untargeted metabolite profiling using liquid chromatography and mass spectrometry coupled via electrospray ionization is a powerful tool for the discovery of novel natural products, metabolic capabilities, and biomarkers. However, the elucidation of the identities of uncharacterized metabolites from spectral features remains challenging. A critical step in the metabolite identification workflow is the assignment of redundant spectral features (adducts, fragments, multimers) and calculation of the underlying chemical formula. Inspection of the data by experts using computational tools solving partial problems (e.g., chemical formula calculation for individual ions) can be performed to disambiguate alternative solutions and provide reliable results. However, manual curation is tedious and not readily scalable or standardized. Here we describe an automated procedure for the robust automated mass spectra interpretation and chemical formula calculation using mixed integer linear programming optimization (RAMSI). Chemical rules among related ions are expressed as linear constraints and both the spectra interpretation and chemical formula calculation are performed in a single optimization step. This approach is unbiased in that it does not require predefined sets of neutral losses and positive and negative polarity spectra can be combined in a single optimization. The procedure was evaluated with 30 experimental mass spectra and was found to effectively identify the protonated or deprotonated molecule ([M + H](+) or [M - H](-)) while being robust to the presence of background ions. RAMSI provides a much-needed standardized tool for interpreting ions for subsequent identification in untargeted metabolomics workflows.

  15. Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nugent, Peter E.; Simonson, J. Michael

    2011-10-24

    This report is based on the Department of Energy (DOE) Workshop on “Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery” that was held at the Bethesda Marriott in Maryland on October 24-25, 2011. The workshop brought together leading researchers from the Basic Energy Sciences (BES) facilities and Advanced Scientific Computing Research (ASCR). The workshop was co-sponsored by these two Offices to identify opportunities and needs for data analysis, ownership, storage, mining, provenance and data transfer at light sources, neutron sources, microscopy centers and other facilities. Their charge was to identify current and anticipated issues inmore » the acquisition, analysis, communication and storage of experimental data that could impact the progress of scientific discovery, ascertain what knowledge, methods and tools are needed to mitigate present and projected shortcomings and to create the foundation for information exchanges and collaboration between ASCR and BES supported researchers and facilities. The workshop was organized in the context of the impending data tsunami that will be produced by DOE’s BES facilities. Current facilities, like SLAC National Accelerator Laboratory’s Linac Coherent Light Source, can produce up to 18 terabytes (TB) per day, while upgraded detectors at Lawrence Berkeley National Laboratory’s Advanced Light Source will generate ~10TB per hour. The expectation is that these rates will increase by over an order of magnitude in the coming decade. The urgency to develop new strategies and methods in order to stay ahead of this deluge and extract the most science from these facilities was recognized by all. The four focus areas addressed in this workshop were: Workflow Management - Experiment to Science: Identifying and managing the data path from experiment to publication. Theory and Algorithms: Recognizing the need for new tools for computation at scale, supporting large data sets and

  16. Performance of an Automated Versus a Manual Whole-Body Magnetic Resonance Imaging Workflow.

    PubMed

    Stocker, Daniel; Finkenstaedt, Tim; Kuehn, Bernd; Nanz, Daniel; Klarhoefer, Markus; Guggenberger, Roman; Andreisek, Gustav; Kiefer, Berthold; Reiner, Caecilia S

    2018-04-24

    The aim of this study was to evaluate the performance of an automated workflow for whole-body magnetic resonance imaging (WB-MRI), which reduces user interaction compared with the manual WB-MRI workflow. This prospective study was approved by the local ethics committee. Twenty patients underwent WB-MRI for myopathy evaluation on a 3 T MRI scanner. Ten patients (7 women; age, 52 ± 13 years; body weight, 69.9 ± 13.3 kg; height, 173 ± 9.3 cm; body mass index, 23.2 ± 3.0) were examined with a prototypical automated WB-MRI workflow, which automatically segments the whole body, and 10 patients (6 women; age, 35.9 ± 12.4 years; body weight, 72 ± 21 kg; height, 169.2 ± 10.4 cm; body mass index, 24.9 ± 5.6) with a manual scan. Overall image quality (IQ; 5-point scale: 5, excellent; 1, poor) and coverage of the study volume were assessed by 2 readers for each sequence (coronal T2-weighted turbo inversion recovery magnitude [TIRM] and axial contrast-enhanced T1-weighted [ce-T1w] gradient dual-echo sequence). Interreader agreement was evaluated with intraclass correlation coefficients. Examination time, number of user interactions, and MR technicians' acceptance rating (1, highest; 10, lowest) was compared between both groups. Total examination time was significantly shorter for automated WB-MRI workflow versus manual WB-MRI workflow (30.0 ± 4.2 vs 41.5 ± 3.4 minutes, P < 0.0001) with significantly shorter planning time (2.5 ± 0.8 vs 14.0 ± 7.0 minutes, P < 0.0001). Planning took 8% of the total examination time with automated versus 34% with manual WB-MRI workflow (P < 0.0001). The number of user interactions with automated WB-MRI workflow was significantly lower compared with manual WB-MRI workflow (10.2 ± 4.4 vs 48.2 ± 17.2, P < 0.0001). Planning efforts were rated significantly lower by the MR technicians for the automated WB-MRI workflow than for the manual WB-MRI workflow (2.20 ± 0.92 vs 4.80 ± 2.39, respectively; P = 0.005). Overall IQ was similar

  17. Grid workflow job execution service 'Pilot'

    NASA Astrophysics Data System (ADS)

    Shamardin, Lev; Kryukov, Alexander; Demichev, Andrey; Ilyin, Vyacheslav

    2011-12-01

    'Pilot' is a grid job execution service for workflow jobs. The main goal for the service is to automate computations with multiple stages since they can be expressed as simple workflows. Each job is a directed acyclic graph of tasks and each task is an execution of something on a grid resource (or 'computing element'). Tasks may be submitted to any WS-GRAM (Globus Toolkit 4) service. The target resources for the tasks execution are selected by the Pilot service from the set of available resources which match the specific requirements from the task and/or job definition. Some simple conditional execution logic is also provided. The 'Pilot' service is built on the REST concepts and provides a simple API through authenticated HTTPS. This service is deployed and used in production in a Russian national grid project GridNNN.

  18. Managing and Communicating Operational Workflow: Designing and Implementing an Electronic Outpatient Whiteboard.

    PubMed

    Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M

    2016-01-01

    Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.

  19. Thermal Remote Sensing with Uav-Based Workflows

    NASA Astrophysics Data System (ADS)

    Boesch, R.

    2017-08-01

    Climate change will have a significant influence on vegetation health and growth. Predictions of higher mean summer temperatures and prolonged summer draughts may pose a threat to agriculture areas and forest canopies. Rising canopy temperatures can be an indicator of plant stress because of the closure of stomata and a decrease in the transpiration rate. Thermal cameras are available for decades, but still often used for single image analysis, only in oblique view manner or with visual evaluations of video sequences. Therefore remote sensing using a thermal camera can be an important data source to understand transpiration processes. Photogrammetric workflows allow to process thermal images similar to RGB data. But low spatial resolution of thermal cameras, significant optical distortion and typically low contrast require an adapted workflow. Temperature distribution in forest canopies is typically completely unknown and less distinct than for urban or industrial areas, where metal constructions and surfaces yield high contrast and sharp edge information. The aim of this paper is to investigate the influence of interior camera orientation, tie point matching and ground control points on the resulting accuracy of bundle adjustment and dense cloud generation with a typically used photogrammetric workflow for UAVbased thermal imagery in natural environments.

  20. A workflow to investigate exposure and pharmacokinetic ...

    EPA Pesticide Factsheets

    Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption, distribution, metabolism, excretion (ADME) properties of chemicals. We developed a conceptual workflow to consider exposure and ADME properties in relationship to an MIE and demonstrated the utility of this workflow using a previously established AOP, acetylcholinesterase (AChE) inhibition. Thirty active chemicals found to inhibit AChE in the ToxCastTM assay were examined with respect to their exposure and absorption potentials, and their ability to cross the blood-brain barrier. Structural similarities of active compounds were compared against structures of inactive compounds to detect possible non-active parents that might have active metabolites. Fifty-two of the 1,029 inactive compounds exhibited a similarity threshold above 75% with their nearest active neighbors. Excluding compounds that may not be absorbed, 22 could be potentially toxic following metabolism. The incorporation of exposure and ADME properties into the conceptual workflow resulted in prioritization of 20 out of 30 active compounds identified in an AChE inhibition assay for further analysis, along with identification of several inactive parent compounds of active metabolites. This qualitative approach can minimize co

  1. Parallel workflow tools to facilitate human brain MRI post-processing

    PubMed Central

    Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

    2015-01-01

    Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043

  2. Barriers to critical thinking: workflow interruptions and task switching among nurses.

    PubMed

    Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin

    2011-10-01

    Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.

  3. Aligning HST Images to Gaia: A Faster Mosaicking Workflow

    NASA Astrophysics Data System (ADS)

    Bajaj, V.

    2017-11-01

    We present a fully programmatic workflow for aligning HST images using the high-quality astrometry provided by Gaia Data Release 1. Code provided in a Jupyter Notebook works through this procedure, including parsing the data to determine the query area parameters, querying Gaia for the coordinate catalog, and using the catalog with TweakReg as reference catalog. This workflow greatly simplifies the normally time-consuming process of aligning HST images, especially those taken as part of mosaics.

  4. Task Delegation Based Access Control Models for Workflow Systems

    NASA Astrophysics Data System (ADS)

    Gaaloul, Khaled; Charoy, François

    e-Government organisations are facilitated and conducted using workflow management systems. Role-based access control (RBAC) is recognised as an efficient access control model for large organisations. The application of RBAC in workflow systems cannot, however, grant permissions to users dynamically while business processes are being executed. We currently observe a move away from predefined strict workflow modelling towards approaches supporting flexibility on the organisational level. One specific approach is that of task delegation. Task delegation is a mechanism that supports organisational flexibility, and ensures delegation of authority in access control systems. In this paper, we propose a Task-oriented Access Control (TAC) model based on RBAC to address these requirements. We aim to reason about task from organisational perspectives and resources perspectives to analyse and specify authorisation constraints. Moreover, we present a fine grained access control protocol to support delegation based on the TAC model.

  5. NOW: A Workflow Language for Orchestration in Nomadic Networks

    NASA Astrophysics Data System (ADS)

    Philips, Eline; van der Straeten, Ragnhild; Jonckers, Viviane

    Existing workflow languages for nomadic or mobile ad hoc networks do not offer adequate support for dealing with the volatile connections inherent to these environments. Services residing on mobile devices are exposed to (temporary) network failures, which should be considered the rule rather than the exception. This paper proposes a nomadic workflow language built on top of an ambient-oriented programming language which supports dynamic service discovery and communication primitives resilient to network failures. Our proposed language provides high level workflow abstractions for control flow and supports rich network and service failure detection and handling through compensating actions. Moreover, we introduce a powerful variable binding mechanism which enables dynamic data flow between services in a nomadic environment. By adding this extra layer of abstraction on top of an ambient-oriented programming language, the application programmer is offered a flexible way to develop applications for nomadic networks.

  6. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    PubMed

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-11-28

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  7. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    PubMed

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-03-01

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  8. Automated selected reaction monitoring data analysis workflow for large-scale targeted proteomic studies.

    PubMed

    Surinova, Silvia; Hüttenhain, Ruth; Chang, Ching-Yun; Espona, Lucia; Vitek, Olga; Aebersold, Ruedi

    2013-08-01

    Targeted proteomics based on selected reaction monitoring (SRM) mass spectrometry is commonly used for accurate and reproducible quantification of protein analytes in complex biological mixtures. Strictly hypothesis-driven, SRM assays quantify each targeted protein by collecting measurements on its peptide fragment ions, called transitions. To achieve sensitive and accurate quantitative results, experimental design and data analysis must consistently account for the variability of the quantified transitions. This consistency is especially important in large experiments, which increasingly require profiling up to hundreds of proteins over hundreds of samples. Here we describe a robust and automated workflow for the analysis of large quantitative SRM data sets that integrates data processing, statistical protein identification and quantification, and dissemination of the results. The integrated workflow combines three software tools: mProphet for peptide identification via probabilistic scoring; SRMstats for protein significance analysis with linear mixed-effect models; and PASSEL, a public repository for storage, retrieval and query of SRM data. The input requirements for the protocol are files with SRM traces in mzXML format, and a file with a list of transitions in a text tab-separated format. The protocol is especially suited for data with heavy isotope-labeled peptide internal standards. We demonstrate the protocol on a clinical data set in which the abundances of 35 biomarker candidates were profiled in 83 blood plasma samples of subjects with ovarian cancer or benign ovarian tumors. The time frame to realize the protocol is 1-2 weeks, depending on the number of replicates used in the experiment.

  9. Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.

    PubMed

    Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher

    2012-12-01

    As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.

  10. Modeling workflow to design machine translation applications for public health practice

    PubMed Central

    Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin

    2014-01-01

    Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922

  11. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  12. Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.

    PubMed

    Mickelson, Robin S; Unertl, Kim M; Holden, Richard J

    2016-10-12

    Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.

  13. Load-sensitive dynamic workflow re-orchestration and optimisation for faster patient healthcare.

    PubMed

    Meli, Christopher L; Khalil, Ibrahim; Tari, Zahir

    2014-01-01

    Hospital waiting times are considerably long, with no signs of reducing any-time soon. A number of factors including population growth, the ageing population and a lack of new infrastructure are expected to further exacerbate waiting times in the near future. In this work, we show how healthcare services can be modelled as queueing nodes, together with healthcare service workflows, such that these workflows can be optimised during execution in order to reduce patient waiting times. Services such as X-ray, computer tomography, and magnetic resonance imaging often form queues, thus, by taking into account the waiting times of each service, the workflow can be re-orchestrated and optimised. Experimental results indicate average waiting time reductions are achievable by optimising workflows using dynamic re-orchestration. Crown Copyright © 2013. Published by Elsevier Ireland Ltd. All rights reserved.

  14. Workflow in interventional radiology: nerve blocks and facet blocks

    NASA Astrophysics Data System (ADS)

    Siddoway, Donald; Ingeholm, Mary Lou; Burgert, Oliver; Neumuth, Thomas; Watson, Vance; Cleary, Kevin

    2006-03-01

    Workflow analysis has the potential to dramatically improve the efficiency and clinical outcomes of medical procedures. In this study, we recorded the workflow for nerve block and facet block procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in an internet browser, and allows the user to record the actions taken by the physician during a procedure. The data recorded during the procedure is stored as an XML document, which can then be further processed. We have successfully gathered data on a number if cases using a tablet PC, and these preliminary results show the feasibility of using this software in an interventional radiology setting. We are currently accruing additional cases and when more data has been collected we will analyze the workflow of these procedures to look for inefficiencies and potential improvements.

  15. Nexus: A modular workflow management system for quantum simulation codes

    NASA Astrophysics Data System (ADS)

    Krogel, Jaron T.

    2016-01-01

    The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.

  16. The radiologist's workflow environment: evaluation of disruptors and potential implications.

    PubMed

    Yu, John-Paul J; Kansagra, Akash P; Mongan, John

    2014-06-01

    Workflow interruptions in the health care delivery environment are a major contributor to medical errors and have been extensively studied within numerous hospital settings, including the nursing environment and the operating room, along with their effects on physician workflow. Less understood, though, is the role of interruptions in other highly specialized clinical domains and subspecialty services, such as diagnostic radiology. The workflow of the on-call radiologist, in particular, is especially susceptible to disruption by telephone calls and other modes of physician-to-physician communication. Herein, the authors describe their initial efforts to quantify the degree of interruption experienced by on-call radiologists and examine its potential implications in patient safety and overall clinical care. Copyright © 2014 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  17. Building an efficient curation workflow for the Arabidopsis literature corpus

    PubMed Central

    Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298

  18. Text mining for the biocuration workflow

    PubMed Central

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  19. Workflow Automation: A Collective Case Study

    ERIC Educational Resources Information Center

    Harlan, Jennifer

    2013-01-01

    Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…

  20. Automating radiologist workflow, part 3: education and training.

    PubMed

    Reiner, Bruce

    2008-12-01

    The current model for radiologist education consists largely of mentorship during residency, followed by peer-to-peer training thereafter. The traditional focus of this radiologist education has historically been restricted to anatomy, pathology, and imaging modality. This "human" mentoring model becomes a limiting factor in the current practice environment because of rapid and dramatic changes in imaging and information technologies, along with the increased time demands placed on practicing radiologists. One novel way to address these burgeoning education and training challenges is to leverage technology, with the creation of user-specific and context-specific automated workflow templates. These automated templates would provide a low-stress, time-efficient, and easy-to-use equivalent of "computerized" mentoring. A radiologist could identify the workflow template of interest on the basis of the specific computer application, pathology, anatomy, or modality of interest. While the corresponding workflow template is activated, the radiologist "student" could effectively start and stop at areas of interest and use the functionality of an electronic wizard to identify additional educational resource of interest. An additional training feature of the technology is the ability to review "proven" cases for the purposes of establishing competence and credentialing.

  1. Formalizing an integrative, multidisciplinary cancer therapy discovery workflow

    PubMed Central

    McGuire, Mary F.; Enderling, Heiko; Wallace, Dorothy I.; Batra, Jaspreet; Jordan, Marie; Kumar, Sushil; Panetta, John C.; Pasquier, Eddy

    2014-01-01

    Although many clinicians and researchers work to understand cancer, there has been limited success to effectively combine forces and collaborate over time, distance, data and budget constraints. Here we present a workflow template for multidisciplinary cancer therapy that was developed during the 2nd Annual Workshop on Cancer Systems Biology sponsored by Tufts University, Boston, MA in July 2012. The template was applied to the development of a metronomic therapy backbone for neuroblastoma. Three primary groups were identified: clinicians, biologists, and scientists (mathematicians, computer scientists, physicists and engineers). The workflow described their integrative interactions; parallel or sequential processes; data sources and computational tools at different stages as well as the iterative nature of therapeutic development from clinical observations to in vitro, in vivo, and clinical trials. We found that theoreticians in dialog with experimentalists could develop calibrated and parameterized predictive models that inform and formalize sets of testable hypotheses, thus speeding up discovery and validation while reducing laboratory resources and costs. The developed template outlines an interdisciplinary collaboration workflow designed to systematically investigate the mechanistic underpinnings of a new therapy and validate that therapy to advance development and clinical acceptance. PMID:23955390

  2. Routine Digital Pathology Workflow: The Catania Experience.

    PubMed

    Fraggetta, Filippo; Garozzo, Salvatore; Zannoni, Gian Franco; Pantanowitz, Liron; Rossi, Esther Diana

    2017-01-01

    Successful implementation of whole slide imaging (WSI) for routine clinical practice has been accomplished in only a few pathology laboratories worldwide. We report the transition to an effective and complete digital surgical pathology workflow in the pathology laboratory at Cannizzaro Hospital in Catania, Italy. All (100%) permanent histopathology glass slides were digitized at ×20 using Aperio AT2 scanners. Compatible stain and scanning slide racks were employed to streamline operations. eSlide Manager software was bidirectionally interfaced with the anatomic pathology laboratory information system. Virtual slide trays connected to the two-dimensional (2D) barcode tracking system allowed pathologists to confirm that they were correctly assigned slides and that all tissues on these glass slides were scanned. Over 115,000 glass slides were digitized with a scan fail rate of around 1%. Drying glass slides before scanning minimized them sticking to scanner racks. Implementation required introduction of a 2D barcode tracking system and modification of histology workflow processes. Our experience indicates that effective adoption of WSI for primary diagnostic use was more dependent on optimizing preimaging variables and integration with the laboratory information system than on information technology infrastructure and ensuring pathologist buy-in. Implementation of digital pathology for routine practice not only leveraged the benefits of digital imaging but also creates an opportunity for establishing standardization of workflow processes in the pathology laboratory.

  3. Improved compliance by BPM-driven workflow automation.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Fleischer, Heidi; Thurow, Kerstin

    2014-12-01

    Using methods and technologies of business process management (BPM) for the laboratory automation has important benefits (i.e., the agility of high-level automation processes, rapid interdisciplinary prototyping and implementation of laboratory tasks and procedures, and efficient real-time process documentation). A principal goal of the model-driven development is the improved transparency of processes and the alignment of process diagrams and technical code. First experiences of using the business process model and notation (BPMN) show that easy-to-read graphical process models can achieve and provide standardization of laboratory workflows. The model-based development allows one to change processes quickly and an easy adaption to changing requirements. The process models are able to host work procedures and their scheduling in compliance with predefined guidelines and policies. Finally, the process-controlled documentation of complex workflow results addresses modern laboratory needs of quality assurance. BPMN 2.0 as an automation language to control every kind of activity or subprocess is directed to complete workflows in end-to-end relationships. BPMN is applicable as a system-independent and cross-disciplinary graphical language to document all methods in laboratories (i.e., screening procedures or analytical processes). That means, with the BPM standard, a communication method of sharing process knowledge of laboratories is also available. © 2014 Society for Laboratory Automation and Screening.

  4. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.

    2008-12-01

    NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a

  5. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.

    2009-04-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the Viz

  6. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    PubMed

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  7. Requirements for Workflow-Based EHR Systems - Results of a Qualitative Study.

    PubMed

    Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander

    2016-01-01

    Today's high quality healthcare delivery strongly relies on efficient electronic health records (EHR). These EHR systems or in general healthcare IT-systems are usually developed in a static manner according to a given workflow. Hence, they are not flexible enough to enable access to EHR data and to execute individual actions within a consultation. This paper reports on requirements pointed by experts in the domain of diabetes mellitus to design a system for supporting dynamic workflows to serve personalization within a medical activity. Requirements were collected by means of expert interviews. These interviews completed a conducted triangulation approach, aimed to gather requirements for workflow-based EHR interactions. The data from the interviews was analyzed through a qualitative approach resulting in a set of requirements enhancing EHR functionality from the user's perspective. Requirements were classified according to four different categorizations: (1) process-related requirements, (2) information needs, (3) required functions, (4) non-functional requirements. Workflow related requirements were identified which should be considered when developing and deploying EHR systems.

  8. Successful Completion of FY18/Q1 ASC L2 Milestone 6355: Electrical Analysis Calibration Workflow Capability Demonstration.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Copps, Kevin D.

    The Sandia Analysis Workbench (SAW) project has developed and deployed a production capability for SIERRA computational mechanics analysis workflows. However, the electrical analysis workflow capability requirements have only been demonstrated in early prototype states, with no real capability deployed for analysts’ use. This milestone aims to improve the electrical analysis workflow capability (via SAW and related tools) and deploy it for ongoing use. We propose to focus on a QASPR electrical analysis calibration workflow use case. We will include a number of new capabilities (versus today’s SAW), such as: 1) support for the XYCE code workflow component, 2) data managementmore » coupled to electrical workflow, 3) human-in-theloop workflow capability, and 4) electrical analysis workflow capability deployed on the restricted (and possibly classified) network at Sandia. While far from the complete set of capabilities required for electrical analysis workflow over the long term, this is a substantial first step toward full production support for the electrical analysts.« less

  9. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

    PubMed

    Winkler, Robert

    2015-01-01

    displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.

  10. RetroPath2.0: A retrosynthesis workflow for metabolic engineers.

    PubMed

    Delépine, Baudoin; Duigou, Thomas; Carbonell, Pablo; Faulon, Jean-Loup

    2018-01-01

    Synthetic biology applied to industrial biotechnology is transforming the way we produce chemicals. However, despite advances in the scale and scope of metabolic engineering, the research and development process still remains costly. In order to expand the chemical repertoire for the production of next generation compounds, a major engineering biology effort is required in the development of novel design tools that target chemical diversity through rapid and predictable protocols. Addressing that goal involves retrosynthesis approaches that explore the chemical biosynthetic space. However, the complexity associated with the large combinatorial retrosynthesis design space has often been recognized as the main challenge hindering the approach. Here, we provide RetroPath2.0, an automated open source workflow for retrosynthesis based on generalized reaction rules that perform the retrosynthesis search from chassis to target through an efficient and well-controlled protocol. Its easiness of use and the versatility of its applications make this tool a valuable addition to the biological engineer bench desk. We show through several examples the application of the workflow to biotechnological relevant problems, including the identification of alternative biosynthetic routes through enzyme promiscuity or the development of biosensors. We demonstrate in that way the ability of the workflow to streamline retrosynthesis pathway design and its major role in reshaping the design, build, test and learn pipeline by driving the process toward the objective of optimizing bioproduction. The RetroPath2.0 workflow is built using tools developed by the bioinformatics and cheminformatics community, because it is open source we anticipate community contributions will likely expand further the features of the workflow. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  11. Observing health professionals' workflow patterns for diabetes care - First steps towards an ontology for EHR services.

    PubMed

    Schweitzer, M; Lasierra, N; Hoerbst, A

    2015-01-01

    Increasing the flexibility from a user-perspective and enabling a workflow based interaction, facilitates an easy user-friendly utilization of EHRs for healthcare professionals' daily work. To offer such versatile EHR-functionality, our approach is based on the execution of clinical workflows by means of a composition of semantic web-services. The backbone of such architecture is an ontology which enables to represent clinical workflows and facilitates the selection of suitable services. In this paper we present the methods and results after running observations of diabetes routine consultations which were conducted in order to identify those workflows and the relation among the included tasks. Mentioned workflows were first modeled by BPMN and then generalized. As a following step in our study, interviews will be conducted with clinical personnel to validate modeled workflows.

  12. Automation of lidar-based hydrologic feature extraction workflows using GIS

    NASA Astrophysics Data System (ADS)

    Borlongan, Noel Jerome B.; de la Cruz, Roel M.; Olfindo, Nestor T.; Perez, Anjillyn Mae C.

    2016-10-01

    With the advent of LiDAR technology, higher resolution datasets become available for use in different remote sensing and GIS applications. One significant application of LiDAR datasets in the Philippines is in resource features extraction. Feature extraction using LiDAR datasets require complex and repetitive workflows which can take a lot of time for researchers through manual execution and supervision. The Development of the Philippine Hydrologic Dataset for Watersheds from LiDAR Surveys (PHD), a project under the Nationwide Detailed Resources Assessment Using LiDAR (Phil-LiDAR 2) program, created a set of scripts, the PHD Toolkit, to automate its processes and workflows necessary for hydrologic features extraction specifically Streams and Drainages, Irrigation Network, and Inland Wetlands, using LiDAR Datasets. These scripts are created in Python and can be added in the ArcGIS® environment as a toolbox. The toolkit is currently being used as an aid for the researchers in hydrologic feature extraction by simplifying the workflows, eliminating human errors when providing the inputs, and providing quick and easy-to-use tools for repetitive tasks. This paper discusses the actual implementation of different workflows developed by Phil-LiDAR 2 Project 4 in Streams, Irrigation Network and Inland Wetlands extraction.

  13. A reliable computational workflow for the selection of optimal screening libraries.

    PubMed

    Gilad, Yocheved; Nadassy, Katalin; Senderowitz, Hanoch

    2015-01-01

    The experimental screening of compound collections is a common starting point in many drug discovery projects. Successes of such screening campaigns critically depend on the quality of the screened library. Many libraries are currently available from different vendors yet the selection of the optimal screening library for a specific project is challenging. We have devised a novel workflow for the rational selection of project-specific screening libraries. The workflow accepts as input a set of virtual candidate libraries and applies the following steps to each library: (1) data curation; (2) assessment of ADME/T profile; (3) assessment of the number of promiscuous binders/frequent HTS hitters; (4) assessment of internal diversity; (5) assessment of similarity to known active compound(s) (optional); (6) assessment of similarity to in-house or otherwise accessible compound collections (optional). For ADME/T profiling, Lipinski's and Veber's rule-based filters were implemented and a new blood brain barrier permeation model was developed and validated (85 and 74 % success rate for training set and test set, respectively). Diversity and similarity descriptors which demonstrated best performances in terms of their ability to select either diverse or focused sets of compounds from three databases (Drug Bank, CMC and CHEMBL) were identified and used for diversity and similarity assessments. The workflow was used to analyze nine common screening libraries available from six vendors. The results of this analysis are reported for each library providing an assessment of its quality. Furthermore, a consensus approach was developed to combine the results of these analyses into a single score for selecting the optimal library under different scenarios. We have devised and tested a new workflow for the rational selection of screening libraries under different scenarios. The current workflow was implemented using the Pipeline Pilot software yet due to the usage of generic

  14. Nexus: a modular workflow management system for quantum simulation codes

    DOE PAGES

    Krogel, Jaron T.

    2015-08-24

    The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less

  15. Argo: enabling the development of bespoke workflows and services for disease annotation.

    PubMed

    Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

    2016-01-01

    Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models

  16. Argo: enabling the development of bespoke workflows and services for disease annotation

    PubMed Central

    Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

    2016-01-01

    Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest. With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources. This article presents the application of Argo’s capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V’s User Interactive Track (IAT), we demonstrated and evaluated Argo’s suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track’s top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo’s support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition

  17. Contextual cloud-based service oriented architecture for clinical workflow.

    PubMed

    Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

    2015-01-01

    Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.

  18. Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    PubMed Central

    Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

    2011-01-01

    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495

  19. Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure

    PubMed Central

    2016-01-01

    Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331

  20. Using workflows to explore and optimise named entity recognition for chemistry.

    PubMed

    Kolluru, Balakrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

    2011-01-01

    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.

  1. Describing and Modeling Workflow and Information Flow in Chronic Disease Care

    PubMed Central

    Unertl, Kim M.; Weinger, Matthew B.; Johnson, Kevin B.; Lorenzi, Nancy M.

    2009-01-01

    Objectives The goal of the study was to develop an in-depth understanding of work practices, workflow, and information flow in chronic disease care, to facilitate development of context-appropriate informatics tools. Design The study was conducted over a 10-month period in three ambulatory clinics providing chronic disease care. The authors iteratively collected data using direct observation and semi-structured interviews. Measurements The authors observed all aspects of care in three different chronic disease clinics for over 150 hours, including 157 patient-provider interactions. Observation focused on interactions among people, processes, and technology. Observation data were analyzed through an open coding approach. The authors then developed models of workflow and information flow using Hierarchical Task Analysis and Soft Systems Methodology. The authors also conducted nine semi-structured interviews to confirm and refine the models. Results The study had three primary outcomes: models of workflow for each clinic, models of information flow for each clinic, and an in-depth description of work practices and the role of health information technology (HIT) in the clinics. The authors identified gaps between the existing HIT functionality and the needs of chronic disease providers. Conclusions In response to the analysis of workflow and information flow, the authors developed ten guidelines for design of HIT to support chronic disease care, including recommendations to pursue modular approaches to design that would support disease-specific needs. The study demonstrates the importance of evaluating workflow and information flow in HIT design and implementation. PMID:19717802

  2. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Potential of knowledge discovery using workflows implemented in the C3Grid

    NASA Astrophysics Data System (ADS)

    Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny

    2013-04-01

    With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity

  4. Genetic Design Automation: engineering fantasy or scientific renewal?

    PubMed Central

    Lux, Matthew W.; Bramlett, Brian W.; Ball, David A.; Peccoud, Jean

    2013-01-01

    Synthetic biology aims to make genetic systems more amenable to engineering, which has naturally led to the development of Computer-Aided Design (CAD) tools. Experimentalists still primarily rely on project-specific ad-hoc workflows instead of domain-specific tools, suggesting that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. PMID:22001068

  5. Tackling the "so what" problem in scientific research: a systems-based approach to resource and publication tracking.

    PubMed

    Harris, Paul A; Kirby, Jacqueline; Swafford, Jonathan A; Edwards, Terri L; Zhang, Minhua; Yarbrough, Tonya R; Lane, Lynda D; Helmer, Tara; Bernard, Gordon R; Pulley, Jill M

    2015-08-01

    Peer-reviewed publications are one measure of scientific productivity. From a project, program, or institutional perspective, publication tracking provides the quantitative data necessary to guide the prudent stewardship of federal, foundation, and institutional investments by identifying the scientific return for the types of support provided. In this article, the authors describe the Vanderbilt Institute for Clinical and Translational Research's (VICTR's) development and implementation of a semiautomated process through which publications are automatically detected in PubMed and adjudicated using a "just-in-time" workflow by a known pool of researchers (from Vanderbilt University School of Medicine and Meharry Medical College) who receive support from Vanderbilt's Clinical and Translational Science Award. Since implementation, the authors have (1) seen a marked increase in the number of publications citing VICTR support, (2) captured at a more granular level the relationship between specific resources/services and scientific output, (3) increased awareness of VICTR's scientific portfolio, and (4) increased efficiency in complying with annual National Institutes of Health progress reports. They present the methodological framework and workflow, measures of impact for the first 30 months, and a set of practical lessons learned to inform others considering a systems-based approach for resource and publication tracking. They learned that contacting multiple authors from a single publication can increase the accuracy of the resource attribution process in the case of multidisciplinary scientific projects. They also found that combining positive (e.g., congratulatory e-mails) and negative (e.g., not allowing future resource requests until adjudication is complete) triggers can increase compliance with publication attribution requests.

  6. Development of a user customizable imaging informatics-based intelligent workflow engine system to enhance rehabilitation clinical trials

    NASA Astrophysics Data System (ADS)

    Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent

    2014-03-01

    Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.

  7. Differentiated protection services with failure probability guarantee for workflow-based applications

    NASA Astrophysics Data System (ADS)

    Zhong, Yaoquan; Guo, Wei; Jin, Yaohui; Sun, Weiqiang; Hu, Weisheng

    2010-12-01

    A cost-effective and service-differentiated provisioning strategy is very desirable to service providers so that they can offer users satisfactory services, while optimizing network resource allocation. Providing differentiated protection services to connections for surviving link failure has been extensively studied in recent years. However, the differentiated protection services for workflow-based applications, which consist of many interdependent tasks, have scarcely been studied. This paper investigates the problem of providing differentiated services for workflow-based applications in optical grid. In this paper, we develop three differentiated protection services provisioning strategies which can provide security level guarantee and network-resource optimization for workflow-based applications. The simulation demonstrates that these heuristic algorithms provide protection cost-effectively while satisfying the applications' failure probability requirements.

  8. Automated quality control in a file-based broadcasting workflow

    NASA Astrophysics Data System (ADS)

    Zhang, Lina

    2014-04-01

    Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.

  9. IT-benchmarking of clinical workflows: concept, implementation, and evaluation.

    PubMed

    Thye, Johannes; Straede, Matthias-Christopher; Liebe, Jan-David; Hübner, Ursula

    2014-01-01

    Due to the emerging evidence of health IT as opportunity and risk for clinical workflows, health IT must undergo a continuous measurement of its efficacy and efficiency. IT-benchmarks are a proven means for providing this information. The aim of this study was to enhance the methodology of an existing benchmarking procedure by including, in particular, new indicators of clinical workflows and by proposing new types of visualisation. Drawing on the concept of information logistics, we propose four workflow descriptors that were applied to four clinical processes. General and specific indicators were derived from these descriptors and processes. 199 chief information officers (CIOs) took part in the benchmarking. These hospitals were assigned to reference groups of a similar size and ownership from a total of 259 hospitals. Stepwise and comprehensive feedback was given to the CIOs. Most participants who evaluated the benchmark rated the procedure as very good, good, or rather good (98.4%). Benchmark information was used by CIOs for getting a general overview, advancing IT, preparing negotiations with board members, and arguing for a new IT project.

  10. Wireless remote control clinical image workflow: utilizing a PDA for offsite distribution

    NASA Astrophysics Data System (ADS)

    Liu, Brent J.; Documet, Luis; Documet, Jorge; Huang, H. K.; Muldoon, Jean

    2004-04-01

    Last year we presented in RSNA an application to perform wireless remote control of PACS image distribution utilizing a handheld device such as a Personal Digital Assistant (PDA). This paper describes the clinical experiences including workflow scenarios of implementing the PDA application to route exams from the clinical PACS archive server to various locations for offsite distribution of clinical PACS exams. By utilizing this remote control application, radiologists can manage image workflow distribution with a single wireless handheld device without impacting their clinical workflow on diagnostic PACS workstations. A PDA application was designed and developed to perform DICOM Query and C-Move requests by a physician from a clinical PACS Archive to a CD-burning device for automatic burning of PACS data for the distribution to offsite. In addition, it was also used for convenient routing of historical PACS exams to the local web server, local workstations, and teleradiology systems. The application was evaluated by radiologists as well as other clinical staff who need to distribute PACS exams to offsite referring physician"s offices and offsite radiologists. An application for image workflow management utilizing wireless technology was implemented in a clinical environment and evaluated. A PDA application was successfully utilized to perform DICOM Query and C-Move requests from the clinical PACS archive to various offsite exam distribution devices. Clinical staff can utilize the PDA to manage image workflow and PACS exam distribution conveniently for offsite consultations by referring physicians and radiologists. This solution allows the radiologist to expand their effectiveness in health care delivery both within the radiology department as well as offisite by improving their clinical workflow.

  11. Emergency Medicine Resident Physicians’ Perceptions of Electronic Documentation and Workflow

    PubMed Central

    Neri, P.M.; Redden, L.; Poole, S.; Pozner, C.N.; Horsky, J.; Raja, A.S.; Poon, E.; Schiff, G.

    2015-01-01

    Summary Objective To understand emergency department (ED) physicians’ use of electronic documentation in order to identify usability and workflow considerations for the design of future ED information system (EDIS) physician documentation modules. Methods We invited emergency medicine resident physicians to participate in a mixed methods study using task analysis and qualitative interviews. Participants completed a simulated, standardized patient encounter in a medical simulation center while documenting in the test environment of a currently used EDIS. We recorded the time on task, type and sequence of tasks performed by the participants (including tasks performed in parallel). We then conducted semi-structured interviews with each participant. We analyzed these qualitative data using the constant comparative method to generate themes. Results Eight resident physicians participated. The simulation session averaged 17 minutes and participants spent 11 minutes on average on tasks that included electronic documentation. Participants performed tasks in parallel, such as history taking and electronic documentation. Five of the 8 participants performed a similar workflow sequence during the first part of the session while the remaining three used different workflows. Three themes characterize electronic documentation: (1) physicians report that location and timing of documentation varies based on patient acuity and workload, (2) physicians report a need for features that support improved efficiency; and (3) physicians like viewing available patient data but struggle with integration of the EDIS with other information sources. Conclusion We confirmed that physicians spend much of their time on documentation (65%) during an ED patient visit. Further, we found that resident physicians did not all use the same workflow and approach even when presented with an identical standardized patient scenario. Future EHR design should consider these varied workflows while trying to

  12. Routine Digital Pathology Workflow: The Catania Experience

    PubMed Central

    Fraggetta, Filippo; Garozzo, Salvatore; Zannoni, Gian Franco; Pantanowitz, Liron; Rossi, Esther Diana

    2017-01-01

    Introduction: Successful implementation of whole slide imaging (WSI) for routine clinical practice has been accomplished in only a few pathology laboratories worldwide. We report the transition to an effective and complete digital surgical pathology workflow in the pathology laboratory at Cannizzaro Hospital in Catania, Italy. Methods: All (100%) permanent histopathology glass slides were digitized at ×20 using Aperio AT2 scanners. Compatible stain and scanning slide racks were employed to streamline operations. eSlide Manager software was bidirectionally interfaced with the anatomic pathology laboratory information system. Virtual slide trays connected to the two-dimensional (2D) barcode tracking system allowed pathologists to confirm that they were correctly assigned slides and that all tissues on these glass slides were scanned. Results: Over 115,000 glass slides were digitized with a scan fail rate of around 1%. Drying glass slides before scanning minimized them sticking to scanner racks. Implementation required introduction of a 2D barcode tracking system and modification of histology workflow processes. Conclusion: Our experience indicates that effective adoption of WSI for primary diagnostic use was more dependent on optimizing preimaging variables and integration with the laboratory information system than on information technology infrastructure and ensuring pathologist buy-in. Implementation of digital pathology for routine practice not only leveraged the benefits of digital imaging but also creates an opportunity for establishing standardization of workflow processes in the pathology laboratory. PMID:29416914

  13. Implementation of workflow engine technology to deliver basic clinical decision support functionality

    PubMed Central

    2011-01-01

    Background Workflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic. Results We present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture. Conclusions We describe an implementation of

  14. Implementation of workflow engine technology to deliver basic clinical decision support functionality.

    PubMed

    Huser, Vojtech; Rasmussen, Luke V; Oberg, Ryan; Starren, Justin B

    2011-04-10

    Workflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic. We present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture. We describe an implementation of a free workflow technology

  15. Automated Finite State Workflow for Distributed Data Production

    NASA Astrophysics Data System (ADS)

    Hajdu, L.; Didenko, L.; Lauret, J.; Amol, J.; Betts, W.; Jang, H. J.; Noh, S. Y.

    2016-10-01

    In statistically hungry science domains, data deluges can be both a blessing and a curse. They allow the narrowing of statistical errors from known measurements, and open the door to new scientific opportunities as research programs mature. They are also a testament to the efficiency of experimental operations. However, growing data samples may need to be processed with little or no opportunity for huge increases in computing capacity. A standard strategy has thus been to share resources across multiple experiments at a given facility. Another has been to use middleware that “glues” resources across the world so they are able to locally run the experimental software stack (either natively or virtually). We describe a framework STAR has successfully used to reconstruct a ~400 TB dataset consisting of over 100,000 jobs submitted to a remote site in Korea from STAR's Tier 0 facility at the Brookhaven National Laboratory. The framework automates the full workflow, taking raw data files from tape and writing Physics-ready output back to tape without operator or remote site intervention. Through hardening we have demonstrated 97(±2)% efficiency, over a period of 7 months of operation. The high efficiency is attributed to finite state checking with retries to encourage resilience in the system over capricious and fallible infrastructure.

  16. Taverna: a tool for building and running workflows of services

    PubMed Central

    Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

    2006-01-01

    Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108

  17. A data management and publication workflow for a large-scale, heterogeneous sensor network.

    PubMed

    Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan

    2015-06-01

    It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.

  18. Using EHR audit trail logs to analyze clinical workflow: A case study from community-based ambulatory clinics.

    PubMed

    Wu, Danny T Y; Smart, Nikolas; Ciemins, Elizabeth L; Lanham, Holly J; Lindberg, Curt; Zheng, Kai

    2017-01-01

    To develop a workflow-supported clinical documentation system, it is a critical first step to understand clinical workflow. While Time and Motion studies has been regarded as the gold standard of workflow analysis, this method can be resource consuming and its data may be biased due to the cognitive limitation of human observers. In this study, we aimed to evaluate the feasibility and validity of using EHR audit trail logs to analyze clinical workflow. Specifically, we compared three known workflow changes from our previous study with the corresponding EHR audit trail logs of the study participants. The results showed that EHR audit trail logs can be a valid source for clinical workflow analysis, and can provide an objective view of clinicians' behaviors, multi-dimensional comparisons, and a highly extensible analysis framework.

  19. Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

    PubMed

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-08-07

    experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

  20. Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

    PubMed Central

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-01-01

    be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127

  1. Workflow technology: the new frontier. How to overcome the barriers and join the future.

    PubMed

    Shefter, Susan M

    2006-01-01

    Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.

  2. DNA barcode-based delineation of putative species: efficient start for taxonomic workflows

    PubMed Central

    Kekkonen, Mari; Hebert, Paul D N

    2014-01-01

    The analysis of DNA barcode sequences with varying techniques for cluster recognition provides an efficient approach for recognizing putative species (operational taxonomic units, OTUs). This approach accelerates and improves taxonomic workflows by exposing cryptic species and decreasing the risk of synonymy. This study tested the congruence of OTUs resulting from the application of three analytical methods (ABGD, BIN, GMYC) to sequence data for Australian hypertrophine moths. OTUs supported by all three approaches were viewed as robust, but 20% of the OTUs were only recognized by one or two of the methods. These OTUs were examined for three criteria to clarify their status. Monophyly and diagnostic nucleotides were both uninformative, but information on ranges was useful as sympatric sister OTUs were viewed as distinct, while allopatric OTUs were merged. This approach revealed 124 OTUs of Hypertrophinae, a more than twofold increase from the currently recognized 51 species. Because this analytical protocol is both fast and repeatable, it provides a valuable tool for establishing a basic understanding of species boundaries that can be validated with subsequent studies. PMID:24479435

  3. Health information exchange technology on the front lines of healthcare: workflow factors and patterns of use

    PubMed Central

    Johnson, Kevin B; Lorenzi, Nancy M

    2011-01-01

    Objective The goal of this study was to develop an in-depth understanding of how a health information exchange (HIE) fits into clinical workflow at multiple clinical sites. Materials and Methods The ethnographic qualitative study was conducted over a 9-month period in six emergency departments (ED) and eight ambulatory clinics in Memphis, Tennessee, USA. Data were collected using direct observation, informal interviews during observation, and formal semi-structured interviews. The authors observed for over 180 h, during which providers used the exchange 130 times. Results HIE-related workflow was modeled for each ED site and ambulatory clinic group and substantial site-to-site workflow differences were identified. Common patterns in HIE-related workflow were also identified across all sites, leading to the development of two role-based workflow models: nurse based and physician based. The workflow elements framework was applied to the two role-based patterns. An in-depth description was developed of how providers integrated HIE into existing clinical workflow, including prompts for HIE use. Discussion Workflow differed substantially among sites, but two general role-based HIE usage models were identified. Although providers used HIE to improve continuity of patient care, patient–provider trust played a significant role. Types of information retrieved related to roles, with nurses seeking to retrieve recent hospitalization data and more open-ended usage by nurse practitioners and physicians. User and role-specific customization to accommodate differences in workflow and information needs may increase the adoption and use of HIE. Conclusion Understanding end users' perspectives towards HIE technology is crucial to the long-term success of HIE. By applying qualitative methods, an in-depth understanding of HIE usage was developed. PMID:22003156

  4. Workflow-Based Software Development Environment

    NASA Technical Reports Server (NTRS)

    Izygon, Michel E.

    2013-01-01

    The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment

  5. chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery.

    PubMed

    Lee, Man-Ling; Aliagas, Ignacio; Feng, Jianwen A; Gabriel, Thomas; O'Donnell, T J; Sellers, Benjamin D; Wiswedel, Bernd; Gobbi, Alberto

    2017-06-12

    Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and write SD files from stdin and to stdout to be wrapped into KNIME nodes. The combination of chemalot and chemalot_knime not only facilitates the compilation and maintenance of sequences of command line programs but also allows KNIME workflows to take advantage of the compute power of a LINUX cluster. Use of the command line programs is demonstrated in three different workflow examples: (1) A workflow to create a data file with project-relevant data for structure-activity or property analysis and other type of investigations, (2) The creation of a quantitative structure-property-relationship model using the command line programs via KNIME nodes, and (3) The analysis of strain energy in small molecule ligand conformations from the Protein Data Bank database. The chemalot and chemalot_knime packages provide lightweight and powerful tools for many tasks in cheminformatics. They are easily integrated with other open source and commercial command line tools and can be combined to build new and even more powerful tools. The chemalot_knime package facilitates the generation and maintenance of user-defined command line workflows, taking advantage of the graphical design capabilities in KNIME. Graphical abstract Example KNIME workflow with chemalot nodes and the corresponding command line pipe.

  6. Clinic Workflow Simulations using Secondary EHR Data

    PubMed Central

    Hribar, Michelle R.; Biermann, David; Read-Brown, Sarah; Reznick, Leah; Lombardi, Lorinna; Parikh, Mansi; Chamberlain, Winston; Yackel, Thomas R.; Chiang, Michael F.

    2016-01-01

    Clinicians today face increased patient loads, decreased reimbursements and potential negative productivity impacts of using electronic health records (EHR), but have little guidance on how to improve clinic efficiency. Discrete event simulation models are powerful tools for evaluating clinical workflow and improving efficiency, particularly when they are built from secondary EHR timing data. The purpose of this study is to demonstrate that these simulation models can be used for resource allocation decision making as well as for evaluating novel scheduling strategies in outpatient ophthalmology clinics. Key findings from this study are that: 1) secondary use of EHR timestamp data in simulation models represents clinic workflow, 2) simulations provide insight into the best allocation of resources in a clinic, 3) simulations provide critical information for schedule creation and decision making by clinic managers, and 4) simulation models built from EHR data are potentially generalizable. PMID:28269861

  7. Computer vision and machine learning for robust phenotyping in genome-wide studies

    PubMed Central

    Zhang, Jiaoping; Naik, Hsiang Sing; Assefa, Teshale; Sarkar, Soumik; Reddy, R. V. Chowda; Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh K.

    2017-01-01

    Traditional evaluation of crop biotic and abiotic stresses are time-consuming and labor-intensive limiting the ability to dissect the genetic basis of quantitative traits. A machine learning (ML)-enabled image-phenotyping pipeline for the genetic studies of abiotic stress iron deficiency chlorosis (IDC) of soybean is reported. IDC classification and severity for an association panel of 461 diverse plant-introduction accessions was evaluated using an end-to-end phenotyping workflow. The workflow consisted of a multi-stage procedure including: (1) optimized protocols for consistent image capture across plant canopies, (2) canopy identification and registration from cluttered backgrounds, (3) extraction of domain expert informed features from the processed images to accurately represent IDC expression, and (4) supervised ML-based classifiers that linked the automatically extracted features with expert-rating equivalent IDC scores. ML-generated phenotypic data were subsequently utilized for the genome-wide association study and genomic prediction. The results illustrate the reliability and advantage of ML-enabled image-phenotyping pipeline by identifying previously reported locus and a novel locus harboring a gene homolog involved in iron acquisition. This study demonstrates a promising path for integrating the phenotyping pipeline into genomic prediction, and provides a systematic framework enabling robust and quicker phenotyping through ground-based systems. PMID:28272456

  8. An automated analysis workflow for optimization of force-field parameters using neutron scattering data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu

    Graphical abstract: - Highlights: • An automated workflow to optimize force-field parameters. • Used the workflow to optimize force-field parameter for a system containing nanodiamond and tRNA. • The mechanism relies on molecular dynamics simulation and neutron scattering experimental data. • The workflow can be generalized to any other experimental and simulation techniques. - Abstract: Large-scale simulations and data analysis are often required to explain neutron scattering experiments to establish a connection between the fundamental physics at the nanoscale and data probed by neutrons. However, to perform simulations at experimental conditions it is critical to use correct force-field (FF) parametersmore » which are unfortunately not available for most complex experimental systems. In this work, we have developed a workflow optimization technique to provide optimized FF parameters by comparing molecular dynamics (MD) to neutron scattering data. We describe the workflow in detail by using an example system consisting of tRNA and hydrophilic nanodiamonds in a deuterated water (D{sub 2}O) environment. Quasi-elastic neutron scattering (QENS) data show a faster motion of the tRNA in the presence of nanodiamond than without the ND. To compare the QENS and MD results quantitatively, a proper choice of FF parameters is necessary. We use an efficient workflow to optimize the FF parameters between the hydrophilic nanodiamond and water by comparing to the QENS data. Our results show that we can obtain accurate FF parameters by using this technique. The workflow can be generalized to other types of neutron data for FF optimization, such as vibrational spectroscopy and spin echo.« less

  9. From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions.

    PubMed

    Ruusmann, Villu; Maran, Uko

    2013-07-01

    The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.

  10. Development of a High-Throughput Ion-Exchange Resin Characterization Workflow.

    PubMed

    Liu, Chun; Dermody, Daniel; Harris, Keith; Boomgaard, Thomas; Sweeney, Jeff; Gisch, Daryl; Goltz, Bob

    2017-06-12

    A novel high-throughout (HTR) ion-exchange (IEX) resin workflow has been developed for characterizing ion exchange equilibrium of commercial and experimental IEX resins against a range of different applications where water environment differs from site to site. Because of its much higher throughput, design of experiment (DOE) methodology can be easily applied for studying the effects of multiple factors on resin performance. Two case studies will be presented to illustrate the efficacy of the combined HTR workflow and DOE method. In case study one, a series of anion exchange resins have been screened for selective removal of NO 3 - and NO 2 - in water environments consisting of multiple other anions, varied pH, and ionic strength. The response surface model (RSM) is developed to statistically correlate the resin performance with the water composition and predict the best resin candidate. In case study two, the same HTR workflow and DOE method have been applied for screening different cation exchange resins in terms of the selective removal of Mg 2+ , Ca 2+ , and Ba 2+ from high total dissolved salt (TDS) water. A master DOE model including all of the cation exchange resins is created to predict divalent cation removal by different IEX resins under specific conditions, from which the best resin candidates can be identified. The successful adoption of HTR workflow and DOE method for studying the ion exchange of IEX resins can significantly reduce the resources and time to address industry and application needs.

  11. A robust ambient temperature collection and stabilization strategy: Enabling worldwide functional studies of the human microbiome

    PubMed Central

    Anderson, Ericka L.; Li, Weizhong; Klitgord, Niels; Highlander, Sarah K.; Dayrit, Mark; Seguritan, Victor; Yooseph, Shibu; Biggs, William; Venter, J. Craig; Nelson, Karen E.; Jones, Marcus B.

    2016-01-01

    As reports on possible associations between microbes and the host increase in number, more meaningful interpretations of this information require an ability to compare data sets across studies. This is dependent upon standardization of workflows to ensure comparability both within and between studies. Here we propose the standard use of an alternate collection and stabilization method that would facilitate such comparisons. The DNA Genotek OMNIgene∙Gut Stool Microbiome Kit was compared to the currently accepted community standard of freezing to store human stool samples prior to whole genome sequencing (WGS) for microbiome studies. This stabilization and collection device allows for ambient temperature storage, automation, and ease of shipping/transfer of samples. The device permitted the same data reproducibility as with frozen samples, and yielded higher recovery of nucleic acids. Collection and stabilization of stool microbiome samples with the DNA Genotek collection device, combined with our extraction and WGS, provides a robust, reproducible workflow that enables standardized global collection, storage, and analysis of stool for microbiome studies. PMID:27558918

  12. Genetic design automation: engineering fantasy or scientific renewal?

    PubMed

    Lux, Matthew W; Bramlett, Brian W; Ball, David A; Peccoud, Jean

    2012-02-01

    The aim of synthetic biology is to make genetic systems more amenable to engineering, which has naturally led to the development of computer-aided design (CAD) tools. Experimentalists still primarily rely on project-specific ad hoc workflows instead of domain-specific tools, which suggests that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Structuring research methods and data with the research object model: genomics workflows as a case study.

    PubMed

    Hettne, Kristina M; Dharuri, Harish; Zhao, Jun; Wolstencroft, Katherine; Belhajjame, Khalid; Soiland-Reyes, Stian; Mina, Eleni; Thompson, Mark; Cruickshank, Don; Verdes-Montenegro, Lourdes; Garrido, Julian; de Roure, David; Corcho, Oscar; Klyne, Graham; van Schouwen, Reinout; 't Hoen, Peter A C; Bechhofer, Sean; Goble, Carole; Roos, Marco

    2014-01-01

    One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

  14. Using location tracking data to assess efficiency in established clinical workflows.

    PubMed

    Meyer, Mark; Fairbrother, Pamela; Egan, Marie; Chueh, Henry; Sandberg, Warren S

    2006-01-01

    Location tracking systems are becoming more prevalent in clinical settings yet applications still are not common. We have designed a system to aid in the assessment of clinical workflow efficiency. Location data is captured from active RFID tags and processed into usable data. These data are stored and presented visually with trending capability over time. The system allows quick assessments of the impact of process changes on workflow, and isolates areas for improvement.

  15. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  16. An Intercompany Perspective on Biopharmaceutical Drug Product Robustness Studies.

    PubMed

    Morar-Mitrica, Sorina; Adams, Monica L; Crotts, George; Wurth, Christine; Ihnat, Peter M; Tabish, Tanvir; Antochshuk, Valentyn; DiLuzio, Willow; Dix, Daniel B; Fernandez, Jason E; Gupta, Kapil; Fleming, Michael S; He, Bing; Kranz, James K; Liu, Dingjiang; Narasimhan, Chakravarthy; Routhier, Eric; Taylor, Katherine D; Truong, Nobel; Stokes, Elaine S E

    2018-02-01

    The Biophorum Development Group (BPDG) is an industry-wide consortium enabling networking and sharing of best practices for the development of biopharmaceuticals. To gain a better understanding of current industry approaches for establishing biopharmaceutical drug product (DP) robustness, the BPDG-Formulation Point Share group conducted an intercompany collaboration exercise, which included a bench-marking survey and extensive group discussions around the scope, design, and execution of robustness studies. The results of this industry collaboration revealed several key common themes: (1) overall DP robustness is defined by both the formulation and the manufacturing process robustness; (2) robustness integrates the principles of quality by design (QbD); (3) DP robustness is an important factor in setting critical quality attribute control strategies and commercial specifications; (4) most companies employ robustness studies, along with prior knowledge, risk assessments, and statistics, to develop the DP design space; (5) studies are tailored to commercial development needs and the practices of each company. Three case studies further illustrate how a robustness study design for a biopharmaceutical DP balances experimental complexity, statistical power, scientific understanding, and risk assessment to provide the desired product and process knowledge. The BPDG-Formulation Point Share discusses identified industry challenges with regard to biopharmaceutical DP robustness and presents some recommendations for best practices. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  17. Scientific Data Management Center for Enabling Technologies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vouk, Mladen A.

    Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Fortunately, the data management problems encountered by most scientific domains are common enough to be addressed through shared technology solutions. Based on community input, we have identified three significant requirements. First, more efficient access to storage systemsmore » is needed. In particular, parallel file system and I/O system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. These processes are complicated by the fact that scientific data are structured differently for specific application domains, and are stored in specialized file formats. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over extremely large data sets. Specialized feature discovery and statistical analysis techniques are needed before the data can be understood or visualized. Furthermore, interactive analysis requires techniques for efficiently selecting subsets of the data. Finally, generating the data, collecting and storing the results, keeping track of data provenance, data post-processing, and analysis of results is a tedious, fragmented process. Tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration. The SDM center was established under the SciDAC program to address these issues. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of

  18. Workflow opportunities using JPEG 2000

    NASA Astrophysics Data System (ADS)

    Foshee, Scott

    2002-11-01

    JPEG 2000 is a new image compression standard from ISO/IEC JTC1 SC29 WG1, the Joint Photographic Experts Group (JPEG) committee. Better thought of as a sibling to JPEG rather than descendant, the JPEG 2000 standard offers wavelet based compression as well as companion file formats and related standardized technology. This paper examines the JPEG 2000 standard for features in four specific areas-compression, file formats, client-server, and conformance/compliance that enable image workflows.

  19. An end-to-end workflow for engineering of biological networks from high-level specifications.

    PubMed

    Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

    2012-08-17

    We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

  20. 76 FR 71928 - Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-11-21

    ... Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011-D027... Wide Area WorkFlow (WAWF) and TRICARE Encounter Data System (TEDS). WAWF, which electronically... civil emergencies, when access to Wide Area WorkFlow by those contractors is not feasible; (4) Purchases...

  1. A Computational Workflow for the Automated Generation of Models of Genetic Designs.

    PubMed

    Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil

    2018-06-05

    Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.

  2. Jupyter meets Earth: Creating Comprehensible and Reproducible Scientific Workflows with Jupyter Notebooks and Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Erickson, T.

    2016-12-01

    Deriving actionable information from Earth observation data obtained from sensors or models can be quite complicated, and sharing those insights with others in a form that they can understand, reproduce, and improve upon is equally difficult. Journal articles, even if digital, commonly present just a summary of an analysis that cannot be understood in depth or reproduced without major effort on the part of the reader. Here we show a method of improving scientific literacy by pairing a recently developed scientific presentation technology (Jupyter Notebooks) with a petabyte-scale platform for accessing and analyzing Earth observation and model data (Google Earth Engine). Jupyter Notebooks are interactive web documents that mix live code with annotations such as rich-text markup, equations, images, videos, hyperlinks and dynamic output. Notebooks were first introduced as part of the IPython project in 2011, and have since gained wide acceptance in the scientific programming community, initially among Python programmers but later by a wide range of scientific programming languages. While Jupyter Notebooks have been widely adopted for general data analysis, data visualization, and machine learning, to date there have been relatively few examples of using Jupyter Notebooks to analyze geospatial datasets. Google Earth Engine is cloud-based platform for analyzing geospatial data, such as satellite remote sensing imagery and/or Earth system model output. Through its Python API, Earth Engine makes petabytes of Earth observation data accessible, and provides hundreds of algorithmic building blocks that can be chained together to produce high-level algorithms and outputs in real-time. We anticipate that this technology pairing will facilitate a better way of creating, documenting, and sharing complex analyses that derive information on our Earth that can be used to promote broader understanding of the complex issues that it faces. http://jupyter.orghttps://earthengine.google.com

  3. Tackling the “So What” Problem in Scientific Research: A Systems-Based Approach to Resource and Publication Tracking

    PubMed Central

    Harris, Paul A.; Kirby, Jacqueline; Swafford, Jonathan A.; Edwards, Terri L.; Zhang, Minhua; Yarbrough, Tonya R.; Lane, Lynda D.; Helmer, Tara; Bernard, Gordon R.; Pulley, Jill M.

    2015-01-01

    Peer-reviewed publications are one measure of scientific productivity. From a project, program, or institutional perspective, publication tracking provides the quantitative data necessary to guide the prudent stewardship of federal, foundation, and institutional investments by identifying the scientific return for the types of support provided. In this article, the authors describe the Vanderbilt Institute for Clinical and Translational Research’s (VICTR’s) development and implementation of a semi-automated process through which publications are automatically detected in PubMed and adjudicated using a “just-in-time” workflow by a known pool of researchers (from Vanderbilt University School of Medicine and Meharry Medical College) who receive support from Vanderbilt’s Clinical and Translational Science Award. Since implementation, the authors have: (1) seen a marked increase in the number of publications citing VICTR support; (2) captured at a more granular level the relationship between specific resources/services and scientific output; (3) increased awareness of VICTR’s scientific portfolio; and (4) increased efficiency in complying with annual National Institutes of Health progress reports. They present the methodological framework and workflow, measures of impact for the first 30 months, and a set of practical lessons learned to inform others considering a systems-based approach for resource and publication tracking. They learned that contacting multiple authors from a single publication can increase the accuracy of the resource attribution process in the case of multidisciplinary scientific projects. They also found that combining positive (e.g., congratulatory e-mails) and negative (e.g., not allowing future resource requests until adjudication is complete) triggers can increase compliance with publication attribution requests. PMID:25901872

  4. Computer imaging and workflow systems in the business office.

    PubMed

    Adams, W T; Veale, F H; Helmick, P M

    1999-05-01

    Computer imaging and workflow technology automates many business processes that currently are performed using paper processes. Documents are scanned into the imaging system and placed in electronic patient account folders. Authorized users throughout the organization, including preadmission, verification, admission, billing, cash posting, customer service, and financial counseling staff, have online access to the information they need when they need it. Such streamlining of business functions can increase collections and customer satisfaction while reducing labor, supply, and storage costs. Because the costs of a comprehensive computer imaging and workflow system can be considerable, healthcare organizations should consider implementing parts of such systems that can be cost-justified or include implementation as part of a larger strategic technology initiative.

  5. High throughput workflow for coacervate formation and characterization in shampoo systems.

    PubMed

    Kalantar, T H; Tucker, C J; Zalusky, A S; Boomgaard, T A; Wilson, B E; Ladika, M; Jordan, S L; Li, W K; Zhang, X; Goh, C G

    2007-01-01

    Cationic cellulosic polymers find wide utility as benefit agents in shampoo. Deposition of these polymers onto hair has been shown to mend split-ends, improve appearance and wet combing, as well as provide controlled delivery of insoluble actives. The deposition is thought to be enhanced by the formation of a polymer/surfactant complex that phase-separates from the bulk solution upon dilution. A standard characterization method has been developed to characterize the coacervate formation upon dilution, but the test is time and material prohibitive. We have developed a semi-automated high throughput workflow to characterize the coacervate-forming behavior of different shampoo formulations. A procedure that allows testing of real use shampoo dilutions without first formulating a complete shampoo was identified. This procedure was adapted to a Tecan liquid handler by optimizing the parameters for liquid dispensing as well as for mixing. The high throughput workflow enabled preparation and testing of hundreds of formulations with different types and levels of cationic cellulosic polymers and surfactants, and for each formulation a haze diagram was constructed. Optimal formulations and their dilutions that give substantial coacervate formation (determined by haze measurements) were identified. Results from this high throughput workflow were shown to reproduce standard haze and bench-top turbidity measurements, and this workflow has the advantages of using less material and allowing more variables to be tested with significant time savings.

  6. Workflow interruptions and mental workload in hospital pediatricians: an observational study.

    PubMed

    Weigl, Matthias; Müller, Andreas; Angerer, Peter; Hoffmann, Florian

    2014-09-24

    Pediatricians' workload is increasingly thought to affect pediatricians' quality of work life and patient safety. Workflow interruptions are a frequent stressor in clinical work, impeding clinicians' attention and contributing to clinical malpractice. We aimed to investigate prospective associations of workflow interruptions with multiple dimensions of mental workload in pediatricians during clinical day shifts. In an Academic Children's Hospital a prospective study of 28 full shift observations was conducted among pediatricians providing ward coverage. The prevalence of workflow interruptions was based on expert observation using a validated observation instrument. Concurrently, Pediatricians' workload ratings were assessed with three workload dimensions of the well-validated NASA-Task Load Index: mental demands, effort, and frustration. Observed pediatricians were, on average, disrupted 4.7 times per hour. Most frequent were interruptions by colleagues (30.2%), nursing staff (29.7%), and by telephone/beeper calls (16.3%). Interruption measures were correlated with two workload outcomes of interest: frequent workflow interruptions were related to less cognitive demands, but frequent interruptions were associated with increased frustration. With regard to single sources, interruptions by colleagues showed the strongest associations to workload. The findings provide insights into specific pathways between different types of interruptions and pediatricians' mental workload. These findings suggest further research and yield a number of work and organization re-design suggestions for pediatric care.

  7. Modeling Complex Workflow in Molecular Diagnostics

    PubMed Central

    Gomah, Mohamed E.; Turley, James P.; Lu, Huimin; Jones, Dan

    2010-01-01

    One of the hurdles to achieving personalized medicine has been implementing the laboratory processes for performing and reporting complex molecular tests. The rapidly changing test rosters and complex analysis platforms in molecular diagnostics have meant that many clinical laboratories still use labor-intensive manual processing and testing without the level of automation seen in high-volume chemistry and hematology testing. We provide here a discussion of design requirements and the results of implementation of a suite of lab management tools that incorporate the many elements required for use of molecular diagnostics in personalized medicine, particularly in cancer. These applications provide the functionality required for sample accessioning and tracking, material generation, and testing that are particular to the evolving needs of individualized molecular diagnostics. On implementation, the applications described here resulted in improvements in the turn-around time for reporting of more complex molecular test sets, and significant changes in the workflow. Therefore, careful mapping of workflow can permit design of software applications that simplify even the complex demands of specialized molecular testing. By incorporating design features for order review, software tools can permit a more personalized approach to sample handling and test selection without compromising efficiency. PMID:20007844

  8. Modernizing Earth and Space Science Modeling Workflows in the Big Data Era

    NASA Astrophysics Data System (ADS)

    Kinter, J. L.; Feigelson, E.; Walker, R. J.; Tino, C.

    2017-12-01

    Modeling is a major aspect of the Earth and space science research. The development of numerical models of the Earth system, planetary systems or astrophysical systems is essential to linking theory with observations. Optimal use of observations that are quite expensive to obtain and maintain typically requires data assimilation that involves numerical models. In the Earth sciences, models of the physical climate system are typically used for data assimilation, climate projection, and inter-disciplinary research, spanning applications from analysis of multi-sensor data sets to decision-making in climate-sensitive sectors with applications to ecosystems, hazards, and various biogeochemical processes. In space physics, most models are from first principles, require considerable expertise to run and are frequently modified significantly for each case study. The volume and variety of model output data from modeling Earth and space systems are rapidly increasing and have reached a scale where human interaction with data is prohibitively inefficient. A major barrier to progress is that modeling workflows isn't deemed by practitioners to be a design problem. Existing workflows have been created by a slow accretion of software, typically based on undocumented, inflexible scripts haphazardly modified by a succession of scientists and students not trained in modern software engineering methods. As a result, existing modeling workflows suffer from an inability to onboard new datasets into models; an inability to keep pace with accelerating data production rates; and irreproducibility, among other problems. These factors are creating an untenable situation for those conducting and supporting Earth system and space science. Improving modeling workflows requires investments in hardware, software and human resources. This paper describes the critical path issues that must be targeted to accelerate modeling workflows, including script modularization, parallelization, and

  9. The road map towards providing a robust Raman spectroscopy-based cancer diagnostic platform and integration into clinic

    NASA Astrophysics Data System (ADS)

    Lau, Katherine; Isabelle, Martin; Lloyd, Gavin R.; Old, Oliver; Shepherd, Neil; Bell, Ian M.; Dorney, Jennifer; Lewis, Aaran; Gaifulina, Riana; Rodriguez-Justo, Manuel; Kendall, Catherine; Stone, Nicolas; Thomas, Geraint; Reece, David

    2016-03-01

    Despite the demonstrated potential as an accurate cancer diagnostic tool, Raman spectroscopy (RS) is yet to be adopted by the clinic for histopathology reviews. The Stratified Medicine through Advanced Raman Technologies (SMART) consortium has begun to address some of the hurdles in its adoption for cancer diagnosis. These hurdles include awareness and acceptance of the technology, practicality of integration into the histopathology workflow, data reproducibility and availability of transferrable models. We have formed a consortium, in joint efforts, to develop optimised protocols for tissue sample preparation, data collection and analysis. These protocols will be supported by provision of suitable hardware and software tools to allow statistically sound classification models to be built and transferred for use on different systems. In addition, we are building a validated gastrointestinal (GI) cancers model, which can be trialled as part of the histopathology workflow at hospitals, and a classification tool. At the end of the project, we aim to deliver a robust Raman based diagnostic platform to enable clinical researchers to stage cancer, define tumour margin, build cancer diagnostic models and discover novel disease bio markers.

  10. The VERCE platform: Enabling Computational Seismology via Streaming Workflows and Science Gateways

    NASA Astrophysics Data System (ADS)

    Spinuso, Alessandro; Filgueira, Rosa; Krause, Amrey; Matser, Jonas; Casarotti, Emanuele; Magnoni, Federica; Gemund, Andre; Frobert, Laurent; Krischer, Lion; Atkinson, Malcolm

    2015-04-01

    The VERCE project is creating an e-Science platform to facilitate innovative data analysis and coding methods that fully exploit the wealth of data in global seismology. One of the technologies developed within the project is the Dispel4Py python library, which allows to describe abstract stream-based workflows for data-intensive applications and to execute them in a distributed environment. At runtime Dispel4Py is able to map workflow descriptions dynamically onto a number of computational resources (Apache Storm clusters, MPI powered clusters, and shared-memory multi-core machines, single-core machines), setting it apart from other workflow frameworks. Therefore, Dispel4Py enables scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Among the workflows developed with Dispel4Py in VERCE, we mention here those for Seismic Ambient Noise Cross-Correlation and MISFIT calculation, which address two data-intensive problems that are common in computational seismology. The former, also called Passive Imaging, allows the detection of relative seismic-wave velocity variations during the time of recording, to be associated with the stress-field changes that occurred in the test area. The MISFIT instead, takes as input the synthetic seismograms generated from HPC simulations for a certain Earth model and earthquake and, after a preprocessing stage, compares them with real observations in order to foster subsequent model updates and improvement (Inversion). The VERCE Science Gateway exposes the MISFIT calculation workflow as a service, in combination with the simulation phase. Both phases can be configured, controlled and monitored by the user via a rich user interface which is integrated within the gUSE Science Gateway framework, hiding the complexity of accessing third parties data services, security mechanisms and enactment on the target resources. Thanks to a modular extension to the Dispel4Py framework

  11. Research on a dynamic workflow access control model

    NASA Astrophysics Data System (ADS)

    Liu, Yiliang; Deng, Jinxia

    2007-12-01

    In recent years, the access control technology has been researched widely in workflow system, two typical technologies of that are RBAC (Role-Based Access Control) and TBAC (Task-Based Access Control) model, which has been successfully used in the role authorizing and assigning in a certain extent. However, during the process of complicating a system's structure, these two types of technology can not be used in minimizing privileges and separating duties, and they are inapplicable when users have a request of frequently changing on the workflow's process. In order to avoid having these weakness during the applying, a variable flow dynamic role_task_view (briefly as DRTVBAC) of fine-grained access control model is constructed on the basis existed model. During the process of this model applying, an algorithm is constructed to solve users' requirements of application and security needs on fine-grained principle of privileges minimum and principle of dynamic separation of duties. The DRTVBAC model is implemented in the actual system, the figure shows that the task associated with the dynamic management of role and the role assignment is more flexible on authority and recovery, it can be met the principle of least privilege on the role implement of a specific task permission activated; separated the authority from the process of the duties completing in the workflow; prevented sensitive information discovering from concise and dynamic view interface; satisfied with the requirement of the variable task-flow frequently.

  12. Quantifying nursing workflow in medication administration.

    PubMed

    Keohane, Carol A; Bane, Anne D; Featherstone, Erica; Hayes, Judy; Woolf, Seth; Hurley, Ann; Bates, David W; Gandhi, Tejal K; Poon, Eric G

    2008-01-01

    New medication administration systems are showing promise in improving patient safety at the point of care, but adoption of these systems requires significant changes in nursing workflow. To prepare for these changes, the authors report on a time-motion study that measured the proportion of time that nurses spend on various patient care activities, focusing on medication administration-related activities. Implications of their findings are discussed.

  13. Agent-Based Scientific Workflow Composition

    NASA Astrophysics Data System (ADS)

    Barker, A.; Mann, B.

    2006-07-01

    Agents are active autonomous entities that interact with one another to achieve their objectives. This paper addresses how these active agents are a natural fit to consume the passive Service Oriented Architecture which is found in Internet and Grid Systems, in order to compose, coordinate and execute e-Science experiments. A framework is introduced which allows an e-Science experiment to be described as a MultiAgent System.

  14. Enabling Smart Workflows over Heterogeneous ID-Sensing Technologies

    PubMed Central

    Giner, Pau; Cetina, Carlos; Lacuesta, Raquel; Palacios, Guillermo

    2012-01-01

    Sensing technologies in mobile devices play a key role in reducing the gap between the physical and the digital world. The use of automatic identification capabilities can improve user participation in business processes where physical elements are involved (Smart Workflows). However, identifying all objects in the user surroundings does not automatically translate into meaningful services to the user. This work introduces Parkour, an architecture that allows the development of services that match the goals of each of the participants in a smart workflow. Parkour is based on a pluggable architecture that can be extended to provide support for new tasks and technologies. In order to facilitate the development of these plug-ins, tools that automate the development process are also provided. Several Parkour-based systems have been developed in order to validate the applicability of the proposal. PMID:23202193

  15. A patient workflow management system built on guidelines.

    PubMed Central

    Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.

    1997-01-01

    To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606

  16. Hypothesis testing of scientific Monte Carlo calculations.

    PubMed

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  17. Hypothesis testing of scientific Monte Carlo calculations

    NASA Astrophysics Data System (ADS)

    Wallerberger, Markus; Gull, Emanuel

    2017-11-01

    The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and programming bugs. However, the testing paradigms developed for deterministic algorithms have proven to be ill suited for stochastic algorithms. In this paper we demonstrate explicitly how the technique of statistical hypothesis testing, which is in wide use in other fields of science, can be used to devise automatic and reliable tests for Monte Carlo methods, and we show that these tests are able to detect some of the common problems encountered in stochastic scientific simulations. We argue that hypothesis testing should become part of the standard testing toolkit for scientific simulations.

  18. The impact of computerized provider order entry systems on inpatient clinical workflow: a literature review.

    PubMed

    Niazkhani, Zahra; Pirnejad, Habibollah; Berg, Marc; Aarts, Jos

    2009-01-01

    Previous studies have shown the importance of workflow issues in the implementation of CPOE systems and patient safety practices. To understand the impact of CPOE on clinical workflow, we developed a conceptual framework and conducted a literature search for CPOE evaluations between 1990 and June 2007. Fifty-one publications were identified that disclosed mixed effects of CPOE systems. Among the frequently reported workflow advantages were the legible orders, remote accessibility of the systems, and the shorter order turnaround times. Among the frequently reported disadvantages were the time-consuming and problematic user-system interactions, and the enforcement of a predefined relationship between clinical tasks and between providers. Regarding the diversity of findings in the literature, we conclude that more multi-method research is needed to explore CPOE's multidimensional and collective impact on especially collaborative workflow.

  19. Towards a General Scientific Reasoning Engine.

    ERIC Educational Resources Information Center

    Carbonell, Jaime G.; And Others

    Expert reasoning in the natural sciences appears to make extensive use of a relatively small number of general principles and reasoning strategies, each associated with a larger number of more specific inference patterns. Using a dual declarative hierarchy to represent strategic and factual knowledge, a framework for a robust scientific reasoning…

  20. A streamlined workflow for single-cells genome-wide copy-number profiling by low-pass sequencing of LM-PCR whole-genome amplification products.

    PubMed

    Ferrarini, Alberto; Forcato, Claudio; Buson, Genny; Tononi, Paola; Del Monaco, Valentina; Terracciano, Mario; Bolognesi, Chiara; Fontana, Francesca; Medoro, Gianni; Neves, Rui; Möhlendick, Birte; Rihawi, Karim; Ardizzoni, Andrea; Sumanasuriya, Semini; Flohr, Penny; Lambros, Maryou; de Bono, Johann; Stoecklein, Nikolas H; Manaresi, Nicolò

    2018-01-01

    Chromosomal instability and associated chromosomal aberrations are hallmarks of cancer and play a critical role in disease progression and development of resistance to drugs. Single-cell genome analysis has gained interest in latest years as a source of biomarkers for targeted-therapy selection and drug resistance, and several methods have been developed to amplify the genomic DNA and to produce libraries suitable for Whole Genome Sequencing (WGS). However, most protocols require several enzymatic and cleanup steps, thus increasing the complexity and length of protocols, while robustness and speed are key factors for clinical applications. To tackle this issue, we developed a single-tube, single-step, streamlined protocol, exploiting ligation mediated PCR (LM-PCR) Whole Genome Amplification (WGA) method, for low-pass genome sequencing with the Ion Torrent™ platform and copy number alterations (CNAs) calling from single cells. The method was evaluated on single cells isolated from 6 aberrant cell lines of the NCI-H series. In addition, to demonstrate the feasibility of the workflow on clinical samples, we analyzed single circulating tumor cells (CTCs) and white blood cells (WBCs) isolated from the blood of patients affected by prostate cancer or lung adenocarcinoma. The results obtained show that the developed workflow generates data accurately representing whole genome absolute copy number profiles of single cell and allows alterations calling at resolutions down to 100 Kbp with as few as 200,000 reads. The presented data demonstrate the feasibility of the Ampli1™ WGA-based low-pass workflow for detection of CNAs in single tumor cells which would be of particular interest for genome-driven targeted therapy selection and for monitoring of disease progression.

  1. Building Digital Audio Preservation Infrastructure and Workflows

    ERIC Educational Resources Information Center

    Young, Anjanette; Olivieri, Blynne; Eckler, Karl; Gerontakos, Theodore

    2010-01-01

    In 2009 the University of Washington (UW) Libraries special collections received funding for the digital preservation of its audio indigenous language holdings. The university libraries, where the authors work in various capacities, had begun digitizing image and text collections in 1997. Because of this, at the onset of the project, workflows (a…

  2. Singularity: Scientific containers for mobility of compute.

    PubMed

    Kurtzer, Gregory M; Sochat, Vanessa; Bauer, Michael W

    2017-01-01

    Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.

  3. Singularity: Scientific containers for mobility of compute

    PubMed Central

    Kurtzer, Gregory M.; Bauer, Michael W.

    2017-01-01

    Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science. PMID:28494014

  4. Integrated petrophysical and reservoir characterization workflow to enhance permeability and water saturation prediction

    NASA Astrophysics Data System (ADS)

    Al-Amri, Meshal; Mahmoud, Mohamed; Elkatatny, Salaheldin; Al-Yousef, Hasan; Al-Ghamdi, Tariq

    2017-07-01

    Accurate estimation of permeability is essential in reservoir characterization and in determining fluid flow in porous media which greatly assists optimize the production of a field. Some of the permeability prediction techniques such as Porosity-Permeability transforms and recently artificial intelligence and neural networks are encouraging but still show moderate to good match to core data. This could be due to limitation to homogenous media while the knowledge about geology and heterogeneity is indirectly related or absent. The use of geological information from core description as in Lithofacies which includes digenetic information show a link to permeability when categorized into rock types exposed to similar depositional environment. The objective of this paper is to develop a robust combined workflow integrating geology and petrophysics and wireline logs in an extremely heterogeneous carbonate reservoir to accurately predict permeability. Permeability prediction is carried out using pattern recognition algorithm called multi-resolution graph-based clustering (MRGC). We will bench mark the prediction results with hard data from core and well test analysis. As a result, we showed how much better improvements are achieved in the permeability prediction when geology is integrated within the analysis. Finally, we use the predicted permeability as an input parameter in J-function and correct for uncertainties in saturation calculation produced by wireline logs using the classical Archie equation. Eventually, high level of confidence in hydrocarbon volumes estimation is reached when robust permeability and saturation height functions are estimated in presence of important geological details that are petrophysically meaningful.

  5. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  6. A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.

    PubMed

    Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary

    2017-12-01

    Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.

  7. Evaluation of user interface and workflow design of a bedside nursing clinical decision support system.

    PubMed

    Yuan, Michael Juntao; Finley, George Mike; Long, Ju; Mills, Christy; Johnson, Ron Kim

    2013-01-31

    Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration (NASA) Task Load Index to self

  8. Evaluation of User Interface and Workflow Design of a Bedside Nursing Clinical Decision Support System

    PubMed Central

    Yuan, Michael Juntao; Finley, George Mike; Mills, Christy; Johnson, Ron Kim

    2013-01-01

    Background Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Objective Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Methods Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration

  9. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.

    PubMed

    Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G

    2013-07-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.

  10. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows

    PubMed Central

    Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.

    2013-01-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784

  11. Ab initio chemical safety assessment: A workflow based on exposure considerations and non-animal methods.

    PubMed

    Berggren, Elisabet; White, Andrew; Ouedraogo, Gladys; Paini, Alicia; Richarz, Andrea-Nicole; Bois, Frederic Y; Exner, Thomas; Leite, Sofia; Grunsven, Leo A van; Worth, Andrew; Mahony, Catherine

    2017-11-01

    We describe and illustrate a workflow for chemical safety assessment that completely avoids animal testing. The workflow, which was developed within the SEURAT-1 initiative, is designed to be applicable to cosmetic ingredients as well as to other types of chemicals, e.g. active ingredients in plant protection products, biocides or pharmaceuticals. The aim of this work was to develop a workflow to assess chemical safety without relying on any animal testing, but instead constructing a hypothesis based on existing data, in silico modelling, biokinetic considerations and then by targeted non-animal testing. For illustrative purposes, we consider a hypothetical new ingredient x as a new component in a body lotion formulation. The workflow is divided into tiers in which points of departure are established through in vitro testing and in silico prediction, as the basis for estimating a safe external dose in a repeated use scenario. The workflow includes a series of possible exit (decision) points, with increasing levels of confidence, based on the sequential application of the Threshold of Toxicological (TTC) approach, read-across, followed by an "ab initio" assessment, in which chemical safety is determined entirely by new in vitro testing and in vitro to in vivo extrapolation by means of mathematical modelling. We believe that this workflow could be applied as a tool to inform targeted and toxicologically relevant in vitro testing, where necessary, and to gain confidence in safety decision making without the need for animal testing.

  12. Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

    NASA Astrophysics Data System (ADS)

    Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

    2017-12-01

    This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

  13. Jflow: a workflow management system for web applications.

    PubMed

    Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

    2016-02-01

    Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. PREDON Scientific Data Preservation 2014

    NASA Astrophysics Data System (ADS)

    Diaconu, C.; Kraml, S.; Surace, C.; Chateigner, D.; Libourel, T.; Laurent, A.; Lin, Y.; Schaming, M.; Benbernou, S.; Lebbah, M.; Boucon, D.; Cérin, C.; Azzag, H.; Mouron, P.; Nief, J.-Y.; Coutin, S.; Beckmann, V.

    Scientific data collected with modern sensors or dedicated detectors exceed very often the perimeter of the initial scientific design. These data are obtained more and more frequently with large material and human efforts. A large class of scientific experiments are in fact unique because of their large scale, with very small chances to be repeated and to superseded by new experiments in the same domain: for instance high energy physics and astrophysics experiments involve multi-annual developments and a simple duplication of efforts in order to reproduce old data is simply not affordable. Other scientific experiments are in fact unique by nature: earth science, medical sciences etc. since the collected data is "time-stamped" and thereby non-reproducible by new experiments or observations. In addition, scientific data collection increased dramatically in the recent years, participating to the so-called "data deluge" and inviting for common reflection in the context of "big data" investigations. The new knowledge obtained using these data should be preserved long term such that the access and the re-use are made possible and lead to an enhancement of the initial investment. Data observatories, based on open access policies and coupled with multi-disciplinary techniques for indexing and mining may lead to truly new paradigms in science. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms. To address this challenge, the PREDON project has been initiated in France in 2012 within the MASTODONS program: a Big Data scientific challenge, initiated and supported by the Interdisciplinary Mission of the National Centre for Scientific Research (CNRS). PREDON is a study group formed by

  15. Using Workflow Diagrams to Address Hand Hygiene in Pediatric Long-Term Care Facilities1

    PubMed Central

    Carter, Eileen J.; Cohen, Bevin; Murray, Meghan T.; Saiman, Lisa; Larson, Elaine L.

    2015-01-01

    Hand hygiene (HH) in pediatric long-term care settings has been found to be sub-optimal. Multidisciplinary teams at three pediatric long-term care facilities developed step-by-step workflow diagrams of commonly performed tasks highlighting HH opportunities. Diagrams were validated through observation of tasks and concurrent diagram assessment. Facility teams developed six workflow diagrams that underwent 22 validation observations. Four main themes emerged: 1) diagram specificity, 2) wording and layout, 3) timing of HH indications, and 4) environmental hygiene. The development of workflow diagrams is an opportunity to identify and address the complexity of HH in pediatric long-term care facilities. PMID:25773517

  16. An Efficient Workflow Environment to Support the Collaborative Development of Actionable Climate Information Using the NCAR Climate Risk Management Engine (CRMe)

    NASA Astrophysics Data System (ADS)

    Ammann, C. M.; Vigh, J. L.; Lee, J. A.

    2016-12-01

    Society's growing needs for robust and relevant climate information have fostered an explosion in tools and frameworks for processing climate projections. Many top-down workflows might be employed to generate sets of pre-computed data and plots, frequently served in a "loading-dock style" through a metadata-enabled search and discovery engine. Despite these increasing resources, the diverse needs of applications-driven projects often result in data processing workflow requirements that cannot be fully satisfied using past approaches. In parallel to the data processing challenges, the provision of climate information to users in a form that is also usable represents a formidable challenge of its own. Finally, many users do not have the time nor the desire to synthesize and distill massive volumes of climate information to find the relevant information for their particular application. All of these considerations call for new approaches to developing actionable climate information. CRMe seeks to bridge the gap between the diversity and richness of bottom-up needs of practitioners, with discrete, structured top-down workflows typically implemented for rapid delivery. Additionally, CRMe has implemented web-based data services capable of providing focused climate information in usable form for a given location, or as spatially aggregated information for entire regions or countries following the needs of users and sectors. Making climate data actionable also involves summarizing and presenting it in concise and approachable ways. CRMe is developing the concept of dashboards, co-developed with the users, to condense the key information into a quick summary of the most relevant, curated climate data for a given discipline, application, or location, while still enabling users to efficiently conduct deeper discovery into rich datasets on an as-needed basis.

  17. Spherical: an iterative workflow for assembling metagenomic datasets.

    PubMed

    Hitch, Thomas C A; Creevey, Christopher J

    2018-01-24

    The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P < 0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .

  18. a Workflow for UAV's Integration Into a Geodesign Platform

    NASA Astrophysics Data System (ADS)

    Anca, P.; Calugaru, A.; Alixandroae, I.; Nazarie, R.

    2016-06-01

    This paper presents a workflow for the development of various Geodesign scenarios. The subject is important in the context of identifying patterns and designing solutions for a Smart City with optimized public transportation, efficient buildings, efficient utilities, recreational facilities a.s.o.. The workflow describes the procedures starting with acquiring data in the field, data processing, orthophoto generation, DTM generation, integration into a GIS platform and analyzing for a better support for Geodesign. Esri's City Engine is used mostly for 3D modeling capabilities that enable the user to obtain 3D realistic models. The workflow uses as inputs information extracted from images acquired using UAVs technologies, namely eBee, existing 2D GIS geodatabases, and a set of CGA rules. The method that we used further, is called procedural modeling, and uses rules in order to extrude buildings, the street network, parcel zoning and side details, based on the initial attributes from the geodatabase. The resulted products are various scenarios for redesigning, for analyzing new exploitation sites. Finally, these scenarios can be published as interactive web scenes for internal, groups or pubic consultation. In this way, problems like the impact of new constructions being build, re-arranging green spaces or changing routes for public transportation, etc. are revealed through impact and visibility analysis or shadowing analysis and are brought to the citizen's attention. This leads to better decisions.

  19. ESO Reflex: a graphical workflow engine for data reduction

    NASA Astrophysics Data System (ADS)

    Hook, Richard; Ullgrén, Marko; Romaniello, Martino; Maisala, Sami; Oittinen, Tero; Solin, Otto; Savolainen, Ville; Järveläinen, Pekka; Tyynelä, Jani; Péron, Michèle; Ballester, Pascal; Gabasch, Armin; Izzo, Carlo

    ESO Reflex is a prototype software tool that provides a novel approach to astronomical data reduction by integrating a modern graphical workflow system (Taverna) with existing legacy data reduction algorithms. Most of the raw data produced by instruments at the ESO Very Large Telescope (VLT) in Chile are reduced using recipes. These are compiled C applications following an ESO standard and utilising routines provided by the Common Pipeline Library (CPL). Currently these are run in batch mode as part of the data flow system to generate the input to the ESO/VLT quality control process and are also exported for use offline. ESO Reflex can invoke CPL-based recipes in a flexible way through a general purpose graphical interface. ESO Reflex is based on the Taverna system that was originally developed within the UK life-sciences community. Workflows have been created so far for three VLT/VLTI instruments, and the GUI allows the user to make changes to these or create workflows of their own. Python scripts or IDL procedures can be easily brought into workflows and a variety of visualisation and display options, including custom product inspection and validation steps, are available. Taverna is intended for use with web services and experiments using ESO Reflex to access Virtual Observatory web services have been successfully performed. ESO Reflex is the main product developed by Sampo, a project led by ESO and conducted by a software development team from Finland as an in-kind contribution to joining ESO. The goal was to look into the needs of the ESO community in the area of data reduction environments and to create pilot software products that illustrate critical steps along the road to a new system. Sampo concluded early in 2008. This contribution will describe ESO Reflex and show several examples of its use both locally and using Virtual Observatory remote web services. ESO Reflex is expected to be released to the community in early 2009.

  20. ESO Reflex: A Graphical Workflow Engine for Data Reduction

    NASA Astrophysics Data System (ADS)

    Hook, R.; Romaniello, M.; Péron, M.; Ballester, P.; Gabasch, A.; Izzo, C.; Ullgrén, M.; Maisala, S.; Oittinen, T.; Solin, O.; Savolainen, V.; Järveläinen, P.; Tyynelä, J.

    2008-08-01

    Sampo {http://www.eso.org/sampo} (Hook et al. 2005) is a project led by ESO and conducted by a software development team from Finland as an in-kind contribution to joining ESO. The goal is to assess the needs of the ESO community in the area of data reduction environments and to create pilot software products that illustrate critical steps along the road to a new system. Those prototypes will not only be used to validate concepts and understand requirements but will also be tools of immediate value for the community. Most of the raw data produced by ESO instruments can be reduced using CPL {http://www.eso.org/cpl} recipes: compiled C programs following an ESO standard and utilizing routines provided by the Common Pipeline Library. Currently reduction recipes are run in batch mode as part of the data flow system to generate the input to the ESO VLT/VLTI quality control process and are also made public for external users. Sampo has developed a prototype application called ESO Reflex {http://www.eso.org/sampo/reflex/} that integrates a graphical user interface and existing data reduction algorithms. ESO Reflex can invoke CPL-based recipes in a flexible way through a dedicated interface. ESO Reflex is based on the graphical workflow engine Taverna {http://taverna.sourceforge.net} that was originally developed by the UK eScience community, mostly for work in the life sciences. Workflows have been created so far for three VLT/VLTI instrument modes ( VIMOS/IFU {http://www.eso.org/instruments/vimos/}, FORS spectroscopy {http://www.eso.org/instruments/fors/} and AMBER {http://www.eso.org/instruments/amber/}), and the easy-to-use GUI allows the user to make changes to these or create workflows of their own. Python scripts and IDL procedures can be easily brought into workflows and a variety of visualisation and display options, including custom product inspection and validation steps, are available.

  1. Large datasets, logistics, sharing and workflow in screening.

    PubMed

    Cook, Tessa S

    2018-03-29

    Cancer screening initiatives exist around the world for different malignancies, most frequently breast, colorectal, and cervical cancer. A number of cancer registries exist to collect relevant data, but while these data may include imaging findings, they rarely, if ever, include actual images. Additionally, the data submitted to the registry are usually correlated with eventual cancer diagnoses and patient outcomes, rather than used with the individual's future screenings. Developing screening programs that allow for images to be submitted to a central location in addition to patient meta data and used for comparison to future screening exams would be very valuable in increasing access to care and ensuring that individuals are effectively screened at appropriate intervals. It would also change the way imaging results and additional patient data are correlated to eventual outcomes. However, it introduces logistical challenges surrounding secure storage and transmission of data to subsequent screening sites. In addition, in the absence of standardized protocols for screening, comparing current and prior imaging, especially from different equipment, can be challenging. Implementing a large-scale screening program with an image-enriched screening registry-effectively, an image-enriched electronic screening record-also requires that incentives exist for screening sites, physicians, and patients to participate; to maximize coverage, participation may have to be supported by government agencies. Workflows will also have to be adjusted to support registry participation for all screening patients in an effort to create a large, robust data set that can be used for future screening efforts as well as research initiatives.center.

  2. Successful adaption of a forensic toxicological screening workflow employing nontargeted liquid chromatography-tandem mass spectrometry to water analysis.

    PubMed

    Steger, Julia; Arnhard, Kathrin; Haslacher, Sandra; Geiger, Klemens; Singer, Klaus; Schlapp, Michael; Pitterl, Florian; Oberacher, Herbert

    2016-04-01

    Forensic toxicology and environmental water analysis share the common interest and responsibility in ensuring comprehensive and reliable confirmation of drugs and pharmaceutical compounds in samples analyzed. Dealing with similar analytes, detection and identification techniques should be exchangeable between scientific disciplines. Herein, we demonstrate the successful adaption of a forensic toxicological screening workflow employing nontargeted LC/MS/MS under data-dependent acquisition control and subsequent database search to water analysis. The main modification involved processing of an increased sample volume with SPE (500 mL vs. 1-10 mL) to reach LODs in the low ng/L range. Tandem mass spectra acquired with a qTOF instrument were submitted to database search. The targeted data mining strategy was found to be sensitive and specific; automated search produced hardly any false results. To demonstrate the applicability of the adapted workflow to complex samples, 14 wastewater effluent samples collected on seven consecutive days at the local wastewater-treatment plant were analyzed. Of the 88,970 fragment ion mass spectra produced, 8.8% of spectra were successfully assigned to one of the 1040 reference compounds included in the database, and this enabled the identification of 51 compounds representing important illegal drugs, members of various pharmaceutical compound classes, and metabolites thereof. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Improving data collection, documentation, and workflow in a dementia screening study.

    PubMed

    Read, Kevin B; LaPolla, Fred Willie Zametkin; Tolea, Magdalena I; Galvin, James E; Surkis, Alisa

    2017-04-01

    A clinical study team performing three multicultural dementia screening studies identified the need to improve data management practices and facilitate data sharing. A collaboration was initiated with librarians as part of the National Library of Medicine (NLM) informationist supplement program. The librarians identified areas for improvement in the studies' data collection, entry, and processing workflows. The librarians' role in this project was to meet needs expressed by the study team around improving data collection and processing workflows to increase study efficiency and ensure data quality. The librarians addressed the data collection, entry, and processing weaknesses through standardizing and renaming variables, creating an electronic data capture system using REDCap, and developing well-documented, reproducible data processing workflows. NLM informationist supplements provide librarians with valuable experience in collaborating with study teams to address their data needs. For this project, the librarians gained skills in project management, REDCap, and understanding of the challenges and specifics of a clinical research study. However, the time and effort required to provide targeted and intensive support for one study team was not scalable to the library's broader user community.

  4. Wireless remote control of clinical image workflow: using a PDA for off-site distribution and disaster recovery.

    PubMed

    Documet, Jorge; Liu, Brent J; Documet, Luis; Huang, H K

    2006-07-01

    This paper describes a picture archiving and communication system (PACS) tool based on Web technology that remotely manages medical images between a PACS archive and remote destinations. Successfully implemented in a clinical environment and also demonstrated for the past 3 years at the conferences of various organizations, including the Radiological Society of North America, this tool provides a very practical and simple way to manage a PACS, including off-site image distribution and disaster recovery. The application is robust and flexible and can be used on a standard PC workstation or a Tablet PC, but more important, it can be used with a personal digital assistant (PDA). With a PDA, the Web application becomes a powerful wireless and mobile image management tool. The application's quick and easy-to-use features allow users to perform Digital Imaging and Communications in Medicine (DICOM) queries and retrievals with a single interface, without having to worry about the underlying configuration of DICOM nodes. In addition, this frees up dedicated PACS workstations to perform their specialized roles within the PACS workflow. This tool has been used at Saint John's Health Center in Santa Monica, California, for 2 years. The average number of queries per month is 2,021, with 816 C-MOVE retrieve requests. Clinical staff members can use PDAs to manage image workflow and PACS examination distribution conveniently for off-site consultations by referring physicians and radiologists and for disaster recovery. This solution also improves radiologists' effectiveness and efficiency in health care delivery both within radiology departments and for off-site clinical coverage.

  5. Launching an EarthCube Interoperability Workbench for Constructing Workflows and Employing Service Interfaces

    NASA Astrophysics Data System (ADS)

    Fulker, D. W.; Pearlman, F.; Pearlman, J.; Arctur, D. K.; Signell, R. P.

    2016-12-01

    A major challenge for geoscientists—and a key motivation for the National Science Foundation's EarchCube initiative—is to integrate data across disciplines, as is necessary for complex Earth-system studies such as climate change. The attendant technical and social complexities have led EarthCube participants to devise a system-of-systems architectural concept. Its centerpiece is a (virtual) interoperability workbench, around which a learning community can coalesce, supported in their evolving quests to join data from diverse sources, to synthesize new forms of data depicting Earth phenomena, and to overcome immense obstacles that arise, for example, from mismatched nomenclatures, projections, mesh geometries and spatial-temporal scales. The full architectural concept will require significant time and resources to implement, but this presentation describes a (minimal) starter kit. With a keep-it-simple mantra this workbench starter kit can fulfill the following four objectives: 1) demonstrate the feasibility of an interoperability workbench by mid-2017; 2) showcase scientifically useful examples of cross-domain interoperability, drawn, e.g., from funded EarthCube projects; 3) highlight selected aspects of EarthCube's architectural concept, such as a system of systems (SoS) linked via service interfaces; 4) demonstrate how workflows can be designed and used in a manner that enables sharing, promotes collaboration and fosters learning. The outcome, despite its simplicity, will embody service interfaces sufficient to construct—from extant components—data-integration and data-synthesis workflows involving multiple geoscience domains. Tentatively, the starter kit will build on the Jupyter Notebook web application, augmented with libraries for interfacing current services (at data centers involved in EarthCube's Council of Data Facilities, e.g.) and services developed specifically for EarthCube and spanning most geoscience domains.

  6. Predictive QSAR modeling workflow, model applicability domains, and virtual screening.

    PubMed

    Tropsha, Alexander; Golbraikh, Alexander

    2007-01-01

    Quantitative Structure Activity Relationship (QSAR) modeling has been traditionally applied as an evaluative approach, i.e., with the focus on developing retrospective and explanatory models of existing data. Model extrapolation was considered if only in hypothetical sense in terms of potential modifications of known biologically active chemicals that could improve compounds' activity. This critical review re-examines the strategy and the output of the modern QSAR modeling approaches. We provide examples and arguments suggesting that current methodologies may afford robust and validated models capable of accurate prediction of compound properties for molecules not included in the training sets. We discuss a data-analytical modeling workflow developed in our laboratory that incorporates modules for combinatorial QSAR model development (i.e., using all possible binary combinations of available descriptor sets and statistical data modeling techniques), rigorous model validation, and virtual screening of available chemical databases to identify novel biologically active compounds. Our approach places particular emphasis on model validation as well as the need to define model applicability domains in the chemistry space. We present examples of studies where the application of rigorously validated QSAR models to virtual screening identified computational hits that were confirmed by subsequent experimental investigations. The emerging focus of QSAR modeling on target property forecasting brings it forward as predictive, as opposed to evaluative, modeling approach.

  7. Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow

    DOE PAGES

    Brunk, Elizabeth; George, Kevin W.; Alonso-Gutierrez, Jorge; ...

    2016-05-19

    Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proofmore » of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.« less

  8. Enhancing population pharmacokinetic modeling efficiency and quality using an integrated workflow.

    PubMed

    Schmidt, Henning; Radivojevic, Andrijana

    2014-08-01

    Population pharmacokinetic (popPK) analyses are at the core of Pharmacometrics and need to be performed regularly. Although these analyses are relatively standard, a large variability can be observed in both the time (efficiency) and the way they are performed (quality). Main reasons for this variability include the level of experience of a modeler, personal preferences and tools. This paper aims to examine how the process of popPK model building can be supported in order to increase its efficiency and quality. The presented approach to the conduct of popPK analyses is centered around three key components: (1) identification of most common and important popPK model features, (2) required information content and formatting of the data for modeling, and (3) methodology, workflow and workflow supporting tools. This approach has been used in several popPK modeling projects and a documented example is provided in the supplementary material. Efficiency of model building is improved by avoiding repetitive coding and other labor-intensive tasks and by putting the emphasis on a fit-for-purpose model. Quality is improved by ensuring that the workflow and tools are in alignment with a popPK modeling guidance which is established within an organization. The main conclusion of this paper is that workflow based approaches to popPK modeling are feasible and have significant potential to ameliorate its various aspects. However, the implementation of such an approach in a pharmacometric organization requires openness towards innovation and change-the key ingredient for evolution of integrative and quantitative drug development in the pharmaceutical industry.

  9. A computational workflow for designing silicon donor qubits

    DOE PAGES

    Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; ...

    2016-09-19

    Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to performmore » detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. In conclusion, the resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.« less

  10. Workflow in interventional radiology: uterine fibroid embolization (UFE)

    NASA Astrophysics Data System (ADS)

    Lindisch, David; Neumuth, Thomas; Burgert, Oliver; Spies, James; Cleary, Kevin

    2008-03-01

    Workflow analysis can be used to record the steps taken during clinical interventions with the goal of identifying bottlenecks and streamlining the procedure efficiency. In this study, we recorded the workflow for uterine fibroid embolization (UFE) procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in a JAVA environment and enables an observer to record the actions taken by the physician and surgical team during these interventions. The data recorded is stored as an XML document, which can then be further processed. We recorded data from 30 patients and found a mean intervention time of 01:49:46 (+/- 16:04) minutes. The critical intervention step, the embolization, had a mean time of 00:15:42 (+/- 05:49) minutes, which was only 15% of the total intervention time.

  11. Enabling a Scientific Cloud Marketplace: VGL (Invited)

    NASA Astrophysics Data System (ADS)

    Fraser, R.; Woodcock, R.; Wyborn, L. A.; Vote, J.; Rankine, T.; Cox, S. J.

    2013-12-01

    The Virtual Geophysics Laboratory (VGL) provides a flexible, web based environment where researchers can browse data and use a variety of scientific software packaged into tool kits that run in the Cloud. Both data and tool kits are published by multiple researchers and registered with the VGL infrastructure forming a data and application marketplace. The VGL provides the basic work flow of Discovery and Access to the disparate data sources and a Library for tool kits and scripting to drive the scientific codes. Computation is then performed on the Research or Commercial Clouds. Provenance information is collected throughout the work flow and can be published alongside the results allowing for experiment comparison and sharing with other researchers. VGL's "mix and match" approach to data, computational resources and scientific codes, enables a dynamic approach to scientific collaboration. VGL allows scientists to publish their specific contribution, be it data, code, compute or work flow, knowing the VGL framework will provide other components needed for a complete application. Other scientists can choose the pieces that suit them best to assemble an experiment. The coarse grain workflow of the VGL framework combined with the flexibility of the scripting library and computational toolkits allows for significant customisation and sharing amongst the community. The VGL utilises the cloud computational and storage resources from the Australian academic research cloud provided by the NeCTAR initiative and a large variety of data accessible from national and state agencies via the Spatial Information Services Stack (SISS - http://siss.auscope.org). VGL v1.2 screenshot - http://vgl.auscope.org

  12. Use of contextual inquiry to understand anatomic pathology workflow: Implications for digital pathology adoption

    PubMed Central

    Ho, Jonhan; Aridor, Orly; Parwani, Anil V.

    2012-01-01

    Background: For decades anatomic pathology (AP) workflow have been a highly manual process based on the use of an optical microscope and glass slides. Recent innovations in scanning and digitizing of entire glass slides are accelerating a move toward widespread adoption and implementation of a workflow based on digital slides and their supporting information management software. To support the design of digital pathology systems and ensure their adoption into pathology practice, the needs of the main users within the AP workflow, the pathologists, should be identified. Contextual inquiry is a qualitative, user-centered, social method designed to identify and understand users’ needs and is utilized for collecting, interpreting, and aggregating in-detail aspects of work. Objective: Contextual inquiry was utilized to document current AP workflow, identify processes that may benefit from the introduction of digital pathology systems, and establish design requirements for digital pathology systems that will meet pathologists’ needs. Materials and Methods: Pathologists were observed and interviewed at a large academic medical center according to contextual inquiry guidelines established by Holtzblatt et al. 1998. Notes representing user-provided data were documented during observation sessions. An affinity diagram, a hierarchal organization of the notes based on common themes in the data, was created. Five graphical models were developed to help visualize the data including sequence, flow, artifact, physical, and cultural models. Results: A total of six pathologists were observed by a team of two researchers. A total of 254 affinity notes were documented and organized using a system based on topical hierarchy, including 75 third-level, 24 second-level, and five main-level categories, including technology, communication, synthesis/preparation, organization, and workflow. Current AP workflow was labor intensive and lacked scalability. A large number of processes that

  13. SynTrack: DNA Assembly Workflow Management (SynTrack) v2.0.1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    MENG, XIANWEI; SIMIRENKO, LISA

    2016-12-01

    SynTrack is a dynamic, workflow-driven data management system that tracks the DNA build process: Management of the hierarchical relationships of the DNA fragments; Monitoring of process tasks for the assembly of multiple DNA fragments into final constructs; Creations of vendor order forms with selectable building blocks. Organizing plate layouts barcodes for vendor/pcr/fusion/chewback/bioassay/glycerol/master plate maps (default/condensed); Creating or updating Pre-Assembly/Assembly process workflows with selected building blocks; Generating Echo pooling instructions based on plate maps; Tracking of building block orders, received and final assembled for delivering; Bulk updating of colony or PCR amplification information, fusion PCR and chewback results; Updating with QA/QCmore » outcome with .csv & .xlsx template files; Re-work assembly workflow enabled before and after sequencing validation; and Tracking of plate/well data changes and status updates and reporting of master plate status with QC outcomes.« less

  14. iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis

    PubMed Central

    2009-01-01

    Background In recent years, the genome biology community has expended considerable effort to confront the challenges of managing heterogeneous data in a structured and organized way and developed laboratory information management systems (LIMS) for both raw and processed data. On the other hand, electronic notebooks were developed to record and manage scientific data, and facilitate data-sharing. Software which enables both, management of large datasets and digital recording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques. Results We have developed iLAP (Laboratory data management, Analysis, and Protocol development), a workflow-driven information management system specifically designed to create and manage experimental protocols, and to analyze and share laboratory data. The system combines experimental protocol development, wizard-based data acquisition, and high-throughput data analysis into a single, integrated system. We demonstrate the power and the flexibility of the platform using a microscopy case study based on a combinatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under the open source license AGPL from http://genome.tugraz.at/iLAP/. Conclusion iLAP is a flexible and versatile information management system, which has the potential to close the gap between electronic notebooks and LIMS and can therefore be of great value for a broad scientific community. PMID:19941647

  15. The need for scientific software engineering in the pharmaceutical industry

    NASA Astrophysics Data System (ADS)

    Luty, Brock; Rose, Peter W.

    2017-03-01

    Scientific software engineering is a distinct discipline from both computational chemistry project support and research informatics. A scientific software engineer not only has a deep understanding of the science of drug discovery but also the desire, skills and time to apply good software engineering practices. A good team of scientific software engineers can create a software foundation that is maintainable, validated and robust. If done correctly, this foundation enable the organization to investigate new and novel computational ideas with a very high level of efficiency.

  16. The need for scientific software engineering in the pharmaceutical industry.

    PubMed

    Luty, Brock; Rose, Peter W

    2017-03-01

    Scientific software engineering is a distinct discipline from both computational chemistry project support and research informatics. A scientific software engineer not only has a deep understanding of the science of drug discovery but also the desire, skills and time to apply good software engineering practices. A good team of scientific software engineers can create a software foundation that is maintainable, validated and robust. If done correctly, this foundation enable the organization to investigate new and novel computational ideas with a very high level of efficiency.

  17. Enabling Efficient Climate Science Workflows in High Performance Computing Environments

    NASA Astrophysics Data System (ADS)

    Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

    2015-12-01

    A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.

  18. PyGOLD: a python based API for docking based virtual screening workflow generation.

    PubMed

    Patel, Hitesh; Brinkjost, Tobias; Koch, Oliver

    2017-08-15

    Molecular docking is one of the successful approaches in structure based discovery and development of bioactive molecules in chemical biology and medicinal chemistry. Due to the huge amount of computational time that is still required, docking is often the last step in a virtual screening approach. Such screenings are set as workflows spanned over many steps, each aiming at different filtering task. These workflows can be automatized in large parts using python based toolkits except for docking using the docking software GOLD. However, within an automated virtual screening workflow it is not feasible to use the GUI in between every step to change the GOLD configuration file. Thus, a python module called PyGOLD was developed, to parse, edit and write the GOLD configuration file and to automate docking based virtual screening workflows. The latest version of PyGOLD, its documentation and example scripts are available at: http://www.ccb.tu-dortmund.de/koch or http://www.agkoch.de. PyGOLD is implemented in Python and can be imported as a standard python module without any further dependencies. oliver.koch@agkoch.de, oliver.koch@tu-dortmund.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Understanding the dispensary workflow at the Birmingham Free Clinic: a proposed framework for an informatics intervention.

    PubMed

    Fisher, Arielle M; Herbert, Mary I; Douglas, Gerald P

    2016-02-19

    The Birmingham Free Clinic (BFC) in Pittsburgh, Pennsylvania, USA is a free, walk-in clinic that serves medically uninsured populations through the use of volunteer health care providers and an on-site medication dispensary. The introduction of an electronic medical record (EMR) has improved several aspects of clinic workflow. However, pharmacists' tasks involving medication management and dispensing have become more challenging since EMR implementation due to its inability to support workflows between the medical and pharmaceutical services. To inform the design of a systematic intervention, we conducted a needs assessment study to identify workflow challenges and process inefficiencies in the dispensary. We used contextual inquiry to document the dispensary workflow and facilitate identification of critical aspects of intervention design specific to the user. Pharmacists were observed according to contextual inquiry guidelines. Graphical models were produced to aid data and process visualization. We created a list of themes describing workflow challenges and asked the pharmacists to rank them in order of significance to narrow the scope of intervention design. Three pharmacists were observed at the BFC. Observer notes were documented and analyzed to produce 13 themes outlining the primary challenges pharmacists encounter during dispensation at the BFC. The dispensary workflow is labor intensive, redundant, and inefficient when integrated with the clinical service. Observations identified inefficiencies that may benefit from the introduction of informatics interventions including: medication labeling, insufficient process notification, triple documentation, and inventory control. We propose a system for Prescription Management and General Inventory Control (RxMAGIC). RxMAGIC is a framework designed to mitigate workflow challenges and improve the processes of medication management and inventory control. While RxMAGIC is described in the context of the BFC

  20. Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

    PubMed

    Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

    2011-01-01

    We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through

  1. Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

    PubMed Central

    Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

    2011-01-01

    We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through

  2. Security and Dependability Solutions for Web Services and Workflows

    NASA Astrophysics Data System (ADS)

    Kokolakis, Spyros; Rizomiliotis, Panagiotis; Benameur, Azzedine; Sinha, Smriti Kumar

    In this chapter we present an innovative approach towards the design and application of Security and Dependability (S&D) solutions for Web services and service-based workflows. Recently, several standards have been published that prescribe S&D solutions for Web services, e.g. OASIS WS-Security. However,the application of these solutions in specific contexts has been proven problematic. We propose a new framework for the application of such solutions based on the SERENITY S&D Pattern concept. An S&D Pattern comprises all the necessary information for the implementation, verification, deployment, and active monitoring of an S&D Solution. Thus, system developers may rely on proven solutions that are dynamically deployed and monitored by the Serenity Runtime Framework. Finally, we further extend this approach to cover the case of executable workflows which are realised through the orchestration of Web services.

  3. Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers.

    PubMed

    Sochat, Vanessa V; Prybol, Cameron J; Kurtzer, Gregory M

    2017-01-01

    Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub's primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers.

  4. Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers

    PubMed Central

    Prybol, Cameron J.; Kurtzer, Gregory M.

    2017-01-01

    Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub’s primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers. PMID:29186161

  5. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    ERIC Educational Resources Information Center

    Friedrich, Jon M.

    2014-01-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…

  6. Trading Robustness Requirements in Mars Entry Trajectory Design

    NASA Technical Reports Server (NTRS)

    Lafleur, Jarret M.

    2009-01-01

    One of the most important metrics characterizing an atmospheric entry trajectory in preliminary design is the size of its predicted landing ellipse. Often, requirements for this ellipse are set early in design and significantly influence both the expected scientific return from a particular mission and the cost of development. Requirements typically specify a certain probability level (6-level) for the prescribed ellipse, and frequently this latter requirement is taken at 36. However, searches for the justification of 36 as a robustness requirement suggest it is an empirical rule of thumb borrowed from non-aerospace fields. This paper presents an investigation into the sensitivity of trajectory performance to varying robustness (6-level) requirements. The treatment of robustness as a distinct objective is discussed, and an analysis framework is presented involving the manipulation of design variables to effect trades between performance and robustness objectives. The scenario for which this method is illustrated is the ballistic entry of an MSL-class Mars entry vehicle. Here, the design variable is entry flight path angle, and objectives are parachute deploy altitude performance and error ellipse robustness. Resulting plots show the sensitivities between these objectives and trends in the entry flight path angles required to design to these objectives. Relevance to the trajectory designer is discussed, as are potential steps for further development and use of this type of analysis.

  7. Leveraging workflow control patterns in the domain of clinical practice guidelines.

    PubMed

    Kaiser, Katharina; Marcos, Mar

    2016-02-10

    Clinical practice guidelines (CPGs) include recommendations describing appropriate care for the management of patients with a specific clinical condition. A number of representation languages have been developed to support executable CPGs, with associated authoring/editing tools. Even with tool assistance, authoring of CPG models is a labor-intensive task. We aim at facilitating the early stages of CPG modeling task. In this context, we propose to support the authoring of CPG models based on a set of suitable procedural patterns described in an implementation-independent notation that can be then semi-automatically transformed into one of the alternative executable CPG languages. We have started with the workflow control patterns which have been identified in the fields of workflow systems and business process management. We have analyzed the suitability of these patterns by means of a qualitative analysis of CPG texts. Following our analysis we have implemented a selection of workflow patterns in the Asbru and PROforma CPG languages. As implementation-independent notation for the description of patterns we have chosen BPMN 2.0. Finally, we have developed XSLT transformations to convert the BPMN 2.0 version of the patterns into the Asbru and PROforma languages. We showed that although a significant number of workflow control patterns are suitable to describe CPG procedural knowledge, not all of them are applicable in the context of CPGs due to their focus on single-patient care. Moreover, CPGs may require additional patterns not included in the set of workflow control patterns. We also showed that nearly all the CPG-suitable patterns can be conveniently implemented in the Asbru and PROforma languages. Finally, we demonstrated that individual patterns can be semi-automatically transformed from a process specification in BPMN 2.0 to executable implementations in these languages. We propose a pattern and transformation-based approach for the development of CPG models

  8. Improving data collection, documentation, and workflow in a dementia screening study

    PubMed Central

    Read, Kevin B.; LaPolla, Fred Willie Zametkin; Tolea, Magdalena I.; Galvin, James E.; Surkis, Alisa

    2017-01-01

    Background A clinical study team performing three multicultural dementia screening studies identified the need to improve data management practices and facilitate data sharing. A collaboration was initiated with librarians as part of the National Library of Medicine (NLM) informationist supplement program. The librarians identified areas for improvement in the studies’ data collection, entry, and processing workflows. Case Presentation The librarians’ role in this project was to meet needs expressed by the study team around improving data collection and processing workflows to increase study efficiency and ensure data quality. The librarians addressed the data collection, entry, and processing weaknesses through standardizing and renaming variables, creating an electronic data capture system using REDCap, and developing well-documented, reproducible data processing workflows. Conclusions NLM informationist supplements provide librarians with valuable experience in collaborating with study teams to address their data needs. For this project, the librarians gained skills in project management, REDCap, and understanding of the challenges and specifics of a clinical research study. However, the time and effort required to provide targeted and intensive support for one study team was not scalable to the library’s broader user community. PMID:28377680

  9. An automated workflow for parallel processing of large multiview SPIM recordings.

    PubMed

    Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel

    2016-04-01

    Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction : schmied@mpi-cbg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  10. An automated workflow for parallel processing of large multiview SPIM recordings

    PubMed Central

    Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel

    2016-01-01

    Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585

  11. A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control

    PubMed Central

    Mathew, Cherian; Obst, Matthias; Vicario, Saverio; Haines, Robert; Williams, Alan R.; de Jong, Yde; Goble, Carole

    2014-01-01

    Abstract The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users. PMID:25535486

  12. Common Data Models and Efficient Reproducible Workflows for Distributed Ocean Model Skill Assessment

    NASA Astrophysics Data System (ADS)

    Signell, R. P.; Snowden, D. P.; Howlett, E.; Fernandes, F. A.

    2014-12-01

    Model skill assessment requires discovery, access, analysis, and visualization of information from both sensors and models, and traditionally has been possible only by a few experts. The US Integrated Ocean Observing System (US-IOOS) consists of 17 Federal Agencies and 11 Regional Associations that produce data from various sensors and numerical models; exactly the information required for model skill assessment. US-IOOS is seeking to develop documented skill assessment workflows that are standardized, efficient, and reproducible so that a much wider community can participate in the use and assessment of model results. Standardization requires common data models for observational and model data. US-IOOS relies on the CF Conventions for observations and structured grid data, and on the UGRID Conventions for unstructured (e.g. triangular) grid data. This allows applications to obtain only the data they require in a uniform and parsimonious way using web services: OPeNDAP for model output and OGC Sensor Observation Service (SOS) for observed data. Reproducibility is enabled with IPython Notebooks shared on GitHub (http://github.com/ioos). These capture the entire skill assessment workflow, including user input, search, access, analysis, and visualization, ensuring that workflows are self-documenting and reproducible by anyone, using free software. Python packages for common data models are Pyugrid and the British Met Office Iris package. Python packages required to run the workflows (pyugrid, pyoos, and the British Met Office Iris package) are also available on GitHub and on Binstar.org so that users can run scenarios using the free Anaconda Python distribution. Hosted services such as Wakari enable anyone to reproduce these workflows for free, without installing any software locally, using just their web browser. We are also experimenting with Wakari Enterprise, which allows multi-user access from a web browser to an IPython Server running where large quantities of

  13. GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

    NASA Astrophysics Data System (ADS)

    Ammazzalorso, F.; Bednarz, T.; Jelen, U.

    2014-03-01

    We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.

  14. Sustaining an Online, Shared Community Resource for Models, Robust Open source Software Tools and Data for Volcanology - the Vhub Experience

    NASA Astrophysics Data System (ADS)

    Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.

    2015-12-01

    Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical

  15. SMITH: a LIMS for handling next-generation sequencing workflows.

    PubMed

    Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko

    2014-01-01

    Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available

  16. SMITH: a LIMS for handling next-generation sequencing workflows

    PubMed Central

    2014-01-01

    workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934

  17. Linking data repositories - an illustration of agile data curation principles through robust documentation and multiple application programming interfaces

    NASA Astrophysics Data System (ADS)

    Benedict, K. K.; Servilla, M. S.; Vanderbilt, K.; Wheeler, J.

    2015-12-01

    The growing volume, variety and velocity of production of Earth science data magnifies the impact of inefficiencies in data acquisition, processing, analysis, and sharing workflows, potentially to the point of impairing the ability of researchers to accomplish their desired scientific objectives. The adaptation of agile software development principles (http://agilemanifesto.org/principles.html) to data curation processes has significant potential to lower barriers to effective scientific data discovery and reuse - barriers that otherwise may force the development of new data to replace existing but unusable data, or require substantial effort to make data usable in new research contexts. This paper outlines a data curation process that was developed at the University of New Mexico that provides a cross-walk of data and associated documentation between the data archive developed by the Long Term Ecological Research (LTER) Network Office (PASTA - http://lno.lternet.edu/content/network-information-system) and UNM's institutional repository (LoboVault - http://repository.unm.edu). The developed automated workflow enables the replication of versioned data objects and their associated standards-based metadata between the LTER system and LoboVault - providing long-term preservation for those data/metadata packages within LoboVault while maintaining the value-added services that the PASTA platform provides. The relative ease with which this workflow was developed is a product of the capabilities independently developed on both platforms - including the simplicity of providing a well-documented application programming interface (API) for each platform enabling scripted interaction and the use of well-established documentation standards (EML in the case of PASTA, Dublin Core in the case of LoboVault) by both systems. These system characteristics, when combined with an iterative process of interaction between the Data Curation Librarian (on the LoboVault side of the process

  18. SU-E-T-419: Workflow and FMEA in a New Proton Therapy (PT) Facility

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, C; Wessels, B; Hamilton, H

    2014-06-01

    Purpose: Workflow is an important component in the operational planning of a new proton facility. By integrating the concept of failure mode and effect analysis (FMEA) and traditional QA requirements, a workflow for a proton therapy treatment course is set up. This workflow serves as the blue print for the planning of computer hardware/software requirements and network flow. A slight modification of the workflow generates a process map(PM) for FMEA and the planning of QA program in PT. Methods: A flowchart is first developed outlining the sequence of processes involved in a PT treatment course. Each process consists of amore » number of sub-processes to encompass a broad scope of treatment and QA procedures. For each subprocess, the personnel involved, the equipment needed and the computer hardware/software as well as network requirements are defined by a team of clinical staff, administrators and IT personnel. Results: Eleven intermediate processes with a total of 70 sub-processes involved in a PT treatment course are identified. The number of sub-processes varies, ranging from 2-12. The sub-processes within each process are used for the operational planning. For example, in the CT-Sim process, there are 12 sub-processes: three involve data entry/retrieval from a record-and-verify system, two controlled by the CT computer, two require department/hospital network, and the other five are setup procedures. IT then decides the number of computers needed and the software and network requirement. By removing the traditional QA procedures from the workflow, a PM is generated for FMEA analysis to design a QA program for PT. Conclusion: Significant efforts are involved in the development of the workflow in a PT treatment course. Our hybrid model of combining FMEA and traditional QA program serves a duo purpose of efficient operational planning and designing of a QA program in PT.« less

  19. CAD/CAM-produced surgical guides: Optimizing the treatment workflow.

    PubMed

    Neugebauer, J; Kistler, F; Kistler, S; Züdorf, G; Freyer, D; Ritter, L; Dreiseidler, T; Kusch, J; Zöller, J E

    2011-01-01

    The increased availability of devices for 3D radiological diagnosis allows the more frequent use of CAD/CAM-produced surgical guides for implant placement. The conventional workflow requires a complex logistic chain which is time-consuming and costly. In a pilot study, the workflow of directly milled surgical guides was evaluated. These surgical guides were designed based on the fusion of an optical impression and the radiological data. The clinical use showed that the surgical guides could be accurately placed on the residual dentition without tipping movements. The conventional surgical guides were used as a control for the manual check of the deviation of the implant axis. The direct transfer of the digital planning data allows the fabrication of surgical guides in an external center without the need of physical transport, which reduces the logistic effort and expense of the central fabrication of surgical guides.

  20. A data-independent acquisition workflow for qualitative screening of new psychoactive substances in biological samples.

    PubMed

    Kinyua, Juliet; Negreira, Noelia; Ibáñez, María; Bijlsma, Lubertus; Hernández, Félix; Covaci, Adrian; van Nuijs, Alexander L N

    2015-11-01

    Identification of new psychoactive substances (NPS) is challenging. Developing targeted methods for their analysis can be difficult and costly due to their impermanence on the drug scene. Accurate-mass mass spectrometry (AMMS) using a quadrupole time-of-flight (QTOF) analyzer can be useful for wide-scope screening since it provides sensitive, full-spectrum MS data. Our article presents a qualitative screening workflow based on data-independent acquisition mode (all-ions MS/MS) on liquid chromatography (LC) coupled to QTOFMS for the detection and identification of NPS in biological matrices. The workflow combines and structures fundamentals of target and suspect screening data processing techniques in a structured algorithm. This allows the detection and tentative identification of NPS and their metabolites. We have applied the workflow to two actual case studies involving drug intoxications where we detected and confirmed the parent compounds ketamine, 25B-NBOMe, 25C-NBOMe, and several predicted phase I and II metabolites not previously reported in urine and serum samples. The screening workflow demonstrates the added value for the detection and identification of NPS in biological matrices.

  1. Common Workflow Service: Standards Based Solution for Managing Operational Processes

    NASA Astrophysics Data System (ADS)

    Tinio, A. W.; Hollins, G. A.

    2017-06-01

    The Common Workflow Service is a collaborative and standards-based solution for managing mission operations processes using techniques from the Business Process Management (BPM) discipline. This presentation describes the CWS and its benefits.

  2. Streamlining Workflow for Endovascular Mechanical Thrombectomy: Lessons Learned from a Comprehensive Stroke Center.

    PubMed

    Wang, Hongjin; Thevathasan, Arthur; Dowling, Richard; Bush, Steven; Mitchell, Peter; Yan, Bernard

    2017-08-01

    Recently, 5 randomized controlled trials confirmed the superiority of endovascular mechanical thrombectomy (EMT) to intravenous thrombolysis in acute ischemic stroke with large-vessel occlusion. The implication is that our health systems would witness an increasing number of patients treated with EMT. However, in-hospital delays, leading to increased time to reperfusion, are associated with poor clinical outcomes. This review outlines the in-hospital workflow of the treatment of acute ischemic stroke at a comprehensive stroke center and the lessons learned in reduction of in-hospital delays. The in-hospital workflow for acute ischemic stroke was described from prehospital notification to femoral arterial puncture in preparation for EMT. Systematic review of literature was also performed with PubMed. The implementation of workflow streamlining could result in reduction of in-hospital time delays for patients who were eligible for EMT. In particular, time-critical measures, including prehospital notification, the transfer of patients from door to computed tomography (CT) room, initiation of intravenous thrombolysis in the CT room, and the mobilization of neurointervention team in parallel with thrombolysis, all contributed to reduction in time delays. We have identified issues resulting in in-hospital time delays and have reported possible solutions to improve workflow efficiencies. We believe that these measures may help stroke centers initiate an EMT service for eligible patients. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.

  3. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.

    PubMed

    Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).

  4. Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2017-10-01

    We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.

  5. Modeling of workflow-engaged networks on radiology transfers across a metro network.

    PubMed

    Camorlinga, Sergio; Schofield, Bruce

    2006-04-01

    Radiology metro networks bear the challenging proposition of interconnecting several hospitals in a region to provide a comprehensive diagnostic imaging service. Consequences of a poorly designed and implemented metro network could cause delays or no access at all when health care providers try to retrieve medical cases across the network. This could translate into limited diagnostic services to patients, resulting in negative impacts to the patients' medical treatment. A workflow-engaged network (WEN) is a new network paradigm. A WEN appreciates radiology workflows and priorities in using the network. A WEN greatly improves the network performance by guaranteeing that critical image transfers experience minimal delay. It adjusts network settings to ensure the application's requirements are met. This means that high-priority image transfers will have guaranteed and known delay times, whereas lower-priority traffic will have increased delays. This paper introduces a modeling to understand the benefits that WEN brings to a radiology metro network. The modeling uses actual data patterns and flows found in a hospital metro region. The workflows considered are based on the Integrating the Healthcare Enterprise profiles. This modeling has been applied to metropolitan workflows of a health region. The modeling helps identify the kind of metro network that supports data patterns and flows in a metro area. The results of the modeling show that a 155-Mb/s metropolitan area network (MAN) with WEN operates virtually equal to a normal 622-Mb/s MAN without WEN, with potential cost savings for leased line services measured in the millions of dollars per year.

  6. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook

    PubMed Central

    Stevens, Jean-Luc R.; Elver, Marco; Bednar, James A.

    2013-01-01

    Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change. PMID:24416014

  7. Parametric Workflow (BIM) for the Repair Construction of Traditional Historic Architecture in Taiwan

    NASA Astrophysics Data System (ADS)

    Ma, Y.-P.; Hsu, C. C.; Lin, M.-C.; Tsai, Z.-W.; Chen, J.-Y.

    2015-08-01

    In Taiwan, numerous existing traditional buildings are constructed with wooden structures, brick structures, and stone structures. This paper will focus on the Taiwan traditional historic architecture and target the traditional wooden structure buildings as the design proposition and process the BIM workflow for modeling complex wooden combination geometry, integrating with more traditional 2D documents and for visualizing repair construction assumptions within the 3D model representation. The goal of this article is to explore the current problems to overcome in wooden historic building conservation, and introduce the BIM technology in the case of conserving, documenting, managing, and creating full engineering drawings and information for effectively support historic conservation. Although BIM is mostly oriented to current construction praxis, there have been some attempts to investigate its applicability in historic conservation projects. This article also illustrates the importance and advantages of using BIM workflow in repair construction process, when comparing with generic workflow.

  8. Model robustness as a confirmatory virtue: The case of climate science.

    PubMed

    Lloyd, Elisabeth A

    2015-02-01

    I propose a distinct type of robustness, which I suggest can support a confirmatory role in scientific reasoning, contrary to the usual philosophical claims. In model robustness, repeated production of the empirically successful model prediction or retrodiction against a background of independently-supported and varying model constructions, within a group of models containing a shared causal factor, may suggest how confident we can be in the causal factor and predictions/retrodictions, especially once supported by a variety of evidence framework. I present climate models of greenhouse gas global warming of the 20th Century as an example, and emphasize climate scientists' discussions of robust models and causal aspects. The account is intended as applicable to a broad array of sciences that use complex modeling techniques. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Lessons from implementing a combined workflow-informatics system for diabetes management.

    PubMed

    Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C

    2008-01-01

    Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.

  10. Workflows and Provenance: Toward Information Science Solutions for the Natural Sciences.

    PubMed

    Gryk, Michael R; Ludäscher, Bertram

    2017-01-01

    The era of big data and ubiquitous computation has brought with it concerns about ensuring reproducibility in this new research environment. It is easy to assume computational methods self-document by their very nature of being exact, deterministic processes. However, similar to laboratory experiments, ensuring reproducibility in the computational realm requires the documentation of both the protocols used (workflows) as well as a detailed description of the computational environment: algorithms, implementations, software environments as well as the data ingested and execution logs of the computation. These two aspects of computational reproducibility (workflows and execution details) are discussed in the context of biomolecular Nuclear Magnetic Resonance spectroscopy (bioNMR) as well as the PRIMAD model for computational reproducibility.

  11. Safety and feasibility of STAT RAD: Improvement of a novel rapid tomotherapy-based radiation therapy workflow by failure mode and effects analysis.

    PubMed

    Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan

    2015-01-01

    The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow

  12. A Unifying Mathematical Framework for Genetic Robustness, Environmental Robustness, Network Robustness and their Tradeoff on Phenotype Robustness in Biological Networks Part II: Ecological Networks

    PubMed Central

    Chen, Bor-Sen; Lin, Ying-Po

    2013-01-01

    In ecological networks, network robustness should be large enough to confer intrinsic robustness for tolerating intrinsic parameter fluctuations, as well as environmental robustness for resisting environmental disturbances, so that the phenotype stability of ecological networks can be maintained, thus guaranteeing phenotype robustness. However, it is difficult to analyze the network robustness of ecological systems because they are complex nonlinear partial differential stochastic systems. This paper develops a unifying mathematical framework for investigating the principles of both robust stabilization and environmental disturbance sensitivity in ecological networks. We found that the phenotype robustness criterion for ecological networks is that if intrinsic robustness + environmental robustness ≦ network robustness, then the phenotype robustness can be maintained in spite of intrinsic parameter fluctuations and environmental disturbances. These results in robust ecological networks are similar to that in robust gene regulatory networks and evolutionary networks even they have different spatial-time scales. PMID:23515112

  13. Content and Workflow Management for Library Websites: Case Studies

    ERIC Educational Resources Information Center

    Yu, Holly, Ed.

    2005-01-01

    Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…

  14. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

    PubMed

    Lu, Zhiyong; Hirschman, Lynette

    2012-01-01

    Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.

  15. Design and validation of general biology learning program based on scientific inquiry skills

    NASA Astrophysics Data System (ADS)

    Cahyani, R.; Mardiana, D.; Noviantoro, N.

    2018-03-01

    Scientific inquiry is highly recommended to teach science. The reality in the schools and colleges is that many educators still have not implemented inquiry learning because of their lack of understanding. The study aims to1) analyze students’ difficulties in learning General Biology, 2) design General Biology learning program based on multimedia-assisted scientific inquiry learning, and 3) validate the proposed design. The method used was Research and Development. The subjects of the study were 27 pre-service students of general elementary school/Islamic elementary schools. The workflow of program design includes identifying learning difficulties of General Biology, designing course programs, and designing instruments and assessment rubrics. The program design is made for four lecture sessions. Validation of all learning tools were performed by expert judge. The results showed that: 1) there are some problems identified in General Biology lectures; 2) the designed products include learning programs, multimedia characteristics, worksheet characteristics, and, scientific attitudes; and 3) expert validation shows that all program designs are valid and can be used with minor revisions. The first section in your paper.

  16. Automatic Image Processing Workflow for the Keck/NIRC2 Vortex Coronagraph

    NASA Astrophysics Data System (ADS)

    Xuan, Wenhao; Cook, Therese; Ngo, Henry; Zawol, Zoe; Ruane, Garreth; Mawet, Dimitri

    2018-01-01

    The Keck/NIRC2 camera, equipped with the vortex coronagraph, is an instrument targeted at the high contrast imaging of extrasolar planets. To uncover a faint planet signal from the overwhelming starlight, we utilize the Vortex Image Processing (VIP) library, which carries out principal component analysis to model and remove the stellar point spread function. To bridge the gap between data acquisition and data reduction, we implement a workflow that 1) downloads, sorts, and processes data with VIP, 2) stores the analysis products into a database, and 3) displays the reduced images, contrast curves, and auxiliary information on a web interface. Both angular differential imaging and reference star differential imaging are implemented in the analysis module. A real-time version of the workflow runs during observations, allowing observers to make educated decisions about time distribution on different targets, hence optimizing science yield. The post-night version performs a standardized reduction after the observation, building up a valuable database that not only helps uncover new discoveries, but also enables a statistical study of the instrument itself. We present the workflow, and an examination of the contrast performance of the NIRC2 vortex with respect to factors including target star properties and observing conditions.

  17. Fabrication of Zirconia-Reinforced Lithium Silicate Ceramic Restorations Using a Complete Digital Workflow

    PubMed Central

    Rödiger, Matthias; Ziebolz, Dirk; Schmidt, Anne-Kathrin

    2015-01-01

    This case report describes the fabrication of monolithic all-ceramic restorations using zirconia-reinforced lithium silicate (ZLS) ceramics. The use of powder-free intraoral scanner, generative fabrication technology of the working model, and CAD/CAM of the restorations in the dental laboratory allows a completely digitized workflow. The newly introduced ZLS ceramics offer a unique combination of fracture strength (>420 MPa), excellent optical properties, and optimum polishing characteristics, thus making them an interesting material option for monolithic restorations in the digital workflow. PMID:26509088

  18. How robust is a robust policy? A comparative analysis of alternative robustness metrics for supporting robust decision analysis.

    NASA Astrophysics Data System (ADS)

    Kwakkel, Jan; Haasnoot, Marjolijn

    2015-04-01

    In response to climate and socio-economic change, in various policy domains there is increasingly a call for robust plans or policies. That is, plans or policies that performs well in a very large range of plausible futures. In the literature, a wide range of alternative robustness metrics can be found. The relative merit of these alternative conceptualizations of robustness has, however, received less attention. Evidently, different robustness metrics can result in different plans or policies being adopted. This paper investigates the consequences of several robustness metrics on decision making, illustrated here by the design of a flood risk management plan. A fictitious case, inspired by a river reach in the Netherlands is used. The performance of this system in terms of casualties, damages, and costs for flood and damage mitigation actions is explored using a time horizon of 100 years, and accounting for uncertainties pertaining to climate change and land use change. A set of candidate policy options is specified up front. This set of options includes dike raising, dike strengthening, creating more space for the river, and flood proof building and evacuation options. The overarching aim is to design an effective flood risk mitigation strategy that is designed from the outset to be adapted over time in response to how the future actually unfolds. To this end, the plan will be based on the dynamic adaptive policy pathway approach (Haasnoot, Kwakkel et al. 2013) being used in the Dutch Delta Program. The policy problem is formulated as a multi-objective robust optimization problem (Kwakkel, Haasnoot et al. 2014). We solve the multi-objective robust optimization problem using several alternative robustness metrics, including both satisficing robustness metrics and regret based robustness metrics. Satisficing robustness metrics focus on the performance of candidate plans across a large ensemble of plausible futures. Regret based robustness metrics compare the

  19. Barriers to effective, safe communication and workflow between nurses and non-consultant hospital doctors during out-of-hours.

    PubMed

    Brady, Anne-Marie; Byrne, Gobnait; Quirke, Mary Brigid; Lynch, Aine; Ennis, Shauna; Bhangu, Jaspreet; Prendergast, Meabh

    2017-11-01

    This study aimed to evaluate the nature and type of communication and workflow arrangements between nurses and doctors out-of-hours (OOH). Effective communication and workflow arrangements between nurses and doctors are essential to minimize risk in hospital settings, particularly in the out-of-hour's period. Timely patient flow is a priority for all healthcare organizations and the quality of communication and workflow arrangements influences patient safety. Qualitative descriptive design and data collection methods included focus groups and individual interviews. A 500 bed tertiary referral acute hospital in Ireland. Junior and senior Non-Consultant Hospital Doctors, staff nurses and nurse managers. Both nurses and doctors acknowledged the importance of good interdisciplinary communication and collaborative working, in sustaining effective workflow and enabling a supportive working environment and patient safety. Indeed, issues of safety and missed care OOH were found to be primarily due to difficulties of communication and workflow. Medical workflow OOH is often dependent on cues and communication to/from nursing. However, communication systems and, in particular the bleep system, considered central to the process of communication between doctors and nurses OOH, can contribute to workflow challenges and increased staff stress. It was reported as commonplace for routine work, that should be completed during normal hours, to fall into OOH when resources were most limited, further compounding risk to patient safety. Enhancement of communication strategies between nurses and doctors has the potential to remove barriers to effective decision-making and patient flow. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  20. Separating Business Logic from Medical Knowledge in Digital Clinical Workflows Using Business Process Model and Notation and Arden Syntax.

    PubMed

    de Bruin, Jeroen S; Adlassnig, Klaus-Peter; Leitich, Harald; Rappelsberger, Andrea

    2018-01-01

    Evidence-based clinical guidelines have a major positive effect on the physician's decision-making process. Computer-executable clinical guidelines allow for automated guideline marshalling during a clinical diagnostic process, thus improving the decision-making process. Implementation of a digital clinical guideline for the prevention of mother-to-child transmission of hepatitis B as a computerized workflow, thereby separating business logic from medical knowledge and decision-making. We used the Business Process Model and Notation language system Activiti for business logic and workflow modeling. Medical decision-making was performed by an Arden-Syntax-based medical rule engine, which is part of the ARDENSUITE software. We succeeded in creating an electronic clinical workflow for the prevention of mother-to-child transmission of hepatitis B, where institution-specific medical decision-making processes could be adapted without modifying the workflow business logic. Separation of business logic and medical decision-making results in more easily reusable electronic clinical workflows.