Science.gov

Sample records for robust scientific workflows

  1. Structured Composition of Dataflow and Control-Flow for Reusable and Robust Scientific Workflows

    SciTech Connect

    Bowers, S; Ludaescher, B; Ngu, A; Critchlow, T

    2005-09-07

    Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, some workflow tasks are particularly ''control-flow intensive'', e.g., procedures to make workflows more fault-tolerant and adaptive in an unreliable, distributed computing environment. Modeling complex control-flow directly within a dataflow framework often leads to overly complicated workflows that are hard to comprehend, reuse, schedule, and maintain. In this paper, we develop a framework that allows a structured embedding of control-flow intensive subtasks within dataflow process networks. In this way, we can seamlessly handle complex control-flows without sacrificing the benefits of dataflow. We build upon a flexible actor-oriented modeling and design approach and extend it with (actor) frames and (workflow) templates. A frame is a placeholder for an (existing or planned) collection of components with similar function and signature. A template partially specifies the behavior of a subworkflow by leaving ''holes'' (i.e., frames) in the subworkflow definition. Taken together, these abstraction mechanisms facilitate the separation and structured re-combination of control-flow and dataflow in scientific workflow applications. We illustrate our approach with a real-world scientific workflow from the astrophysics domain. This data-intensive workflow requires remote execution and file transfer in a semi-reliable environment. For such work-flows, we propose a 3-layered architecture: The top-level, typically a dataflow process network, includes Generic Data Transfer (GDT) frames and Generic remote eXecution (GX) frames. At the second level, the user can specialize the behavior of these generic components by embedding a suitable template (here: transducer templates for control-flow intensive tasks). At the third level, frames inside the transducer template are specialized by embedding

  2. Scientific Workflows in Astronomy

    NASA Astrophysics Data System (ADS)

    Schaaff, A.; Verdes-Montenegro, L.; Ruiz, J. E.; Santander-Vela, J.

    2012-09-01

    We will soon be facing a new generation of facilities and archives dealing with huge amounts of data (ALMA, LSST, Pan-Starrs, LOFAR, SKA pathfinders,…) where scientific workflows will play an important role in the working methodology of astronomers. While the traditional pipelines tend to produce exploitable products, scientific workflows are aimed at producing scientific insight. Virtual Observatory standards provide the tools to design reproducible scientific workflows. A detailed analysis about the state of the art of workflows involves languages, design tools, execution engines, use cases, etc. A major topic is also the preservation of the workflows and the capability to replay a workflow several years after its design and implementation. Discussions on these topics are being held recently in IVOA forums and are part of the work that is being done in the Wf4Ever project. The purpose of the BoF was to present to the community the work in progress at the IVOA, collect ideas and identify needs not yet addressed.

  3. Scientific workflows for bibliometrics.

    PubMed

    Guler, Arzu Tugce; Waaijer, Cathelijn J F; Palmblad, Magnus

    Scientific workflows organize the assembly of specialized software into an overall data flow and are particularly well suited for multi-step analyses using different types of software tools. They are also favorable in terms of reusability, as previously designed workflows could be made publicly available through the myExperiment community and then used in other workflows. We here illustrate how scientific workflows and the Taverna workbench in particular can be used in bibliometrics. We discuss the specific capabilities of Taverna that makes this software a powerful tool in this field, such as automated data import via Web services, data extraction from XML by XPaths, and statistical analysis and visualization with R. The support of the latter is particularly relevant, as it allows integration of a number of recently developed R packages specifically for bibliometrics. Examples are used to illustrate the possibilities of Taverna in the fields of bibliometrics and scientometrics.

  4. A robust scientific workflow for assessing fire danger levels using open-source software

    NASA Astrophysics Data System (ADS)

    Vitolo, Claudia; Di Giuseppe, Francesca; Smith, Paul

    2017-04-01

    Modelling forest fires is theoretically and computationally challenging because it involves the use of a wide variety of information, in large volumes and affected by high uncertainties. In-situ observations of wildfire, for instance, are highly sparse and need to be complemented by remotely sensed data measuring biomass burning to achieve homogeneous coverage at global scale. Fire models use weather reanalysis products to measure energy release and rate of spread but can only assess the potential predictability of fire danger as the actual ignition is due to human behaviour and, therefore, very unpredictable. Lastly, fire forecasting systems rely on weather forecasts to extend the advance warning but are currently calibrated using fire danger thresholds that are defined at global scale and do not take into account the spatial variability of fuel availability. As a consequence, uncertainties sharply increase cascading from the observational to the modelling stage and they might be further inflated by non-reproducible analyses. Although uncertainties in observations will only decrease with technological advances over the next decades, the other uncertainties (i.e. generated during modelling and post-processing) can already be addressed by developing transparent and reproducible analysis workflows, even more if implemented within open-source initiatives. This is because reproducible workflows aim to streamline the processing task as they present ready-made solutions to handle and manipulate complex and heterogeneous datasets. Also, opening the code to the scrutiny of other experts increases the chances to implement more robust solutions and avoids duplication of efforts. In this work we present our contribution to the forest fire modelling community: an open-source tool called "caliver" for the calibration and verification of forest fire model results. This tool is developed in the R programming language and publicly available under an open license. We will present

  5. Managing and Documenting Legacy Scientific Workflows.

    PubMed

    Acuña, Ruben; Chomilier, Jacques; Lacroix, Zoé

    2015-10-06

    Scientific legacy workflows are often developed over many years, poorly documented and implemented with scripting languages. In the context of our cross-disciplinary projects we face the problem of maintaining such scientific workflows. This paper presents the Workflow Instrumentation for Structure Extraction (WISE) method used to process several ad-hoc legacy workflows written in Python and automatically produce their workflow structural skeleton. Unlike many existing methods, WISE does not assume input workflows to be preprocessed in a known workflow formalism. It is also able to identify and analyze calls to external tools. We present the method and report its results on several scientific workflows.

  6. Scientific Workflow Management in Proteomics

    PubMed Central

    de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

    2012-01-01

    Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703

  7. Scientific Process Automation and Workflow Management

    SciTech Connect

    Ludaescher, Bertram T.; Altintas, Ilkay; Bowers, Shawn; Cummings, J.; Critchlow, Terence J.; Deelman, Ewa; De Roure, D.; Freire, Juliana; Goble, Carole; Jones, Matt; Klasky, S.; McPhillips, Timothy; Podhorszki, Norbert; Silva, C.; Taylor, I.; Vouk, M.

    2010-01-01

    We introduce and describe scientific workflows, i.e., executable descriptions of automatable scientific processes such as computational science simulations and data analyses. Scientific workflows are often expressed in terms of tasks and their (data ow) dependencies. This chapter first provides an overview of the characteristic features of scientific workflows and outlines their life cycle. A detailed case study highlights workflow challenges and solutions in simulation management. We then provide a brief overview of how some concrete systems support the various phases of the workflow life cycle, i.e., design, resource management, execution, and provenance management. We conclude with a discussion on community-based workflow sharing.

  8. Context-Aware Scientific Workflow Systems using KEPLER

    SciTech Connect

    Ngu, Anne H.; Jamnagarwala, Arwa; Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

    2010-04-01

    Data-intensive scientific workflows are often modeled using a dataflow-oriented model. The simplicity of a dataflow model facilitates intuitive workflow design, analysis, and optimization. However, some amount of control-flow modeling is often necessary for engineering fault-tolerant, robust, and adaptive workflows. Modeling the control-flow using inherent dataflow constructs will quickly end up with a workflow that is hard to comprehend, reuse, and maintain. In this paper, we propose a context-aware architecture for scientific workflows. By incorporating contexts within a data-flow oriented scientific workflow system, we enable the development of context-aware scientific workflows without the need to use numerous low-level control-flow actors. This results in a workflow that is aware of its environment during execution with minimal user input and responds intelligently based on such awareness at runtime. A further advantage of our approach is that the defined contexts can be reused and shared across other workflows. We demonstrate our approach with two prototype implementation of context-aware actors in KEPLER.

  9. Working with Workflows: Highlights from 5 years Building Scientific Workflows

    SciTech Connect

    Critchlow, Terence J.; Altintas, Ilkay; Chin, George; Crawl, Daniel; Iyer, H.; Khan, Ayla; Klasky, S.; Koehler, Sven; Ludaescher, Bertram T.; Mouallem, Pierre; Nagappan, Mie; Podhorszki, Norbert; Shoshani, Arie; Silva, C.; Tchoua, Roselynne; Vouk, M.

    2011-07-30

    In 2006, the SciDAC Scientific Data Management (SDM) Center proposed to continue its work deploying leading edge data management and analysis capabilities to scientific applications. One of three thrust areas within the proposed center was focused on Scientific Process Automation (SPA) using workflow technology. As a founding member of the Kepler consortium [LAB+09], the SDM Center team was well positioned to begin deploying workflows immediately. We were also keenly aware of some of the deficiencies in Kepler when applied to high performance computing workflows, which allowed us to focus our research and development efforts on critical new capabilities which were ultimately integrated into the Kepler open source distribution, benefiting the entire community. Significant work was required to ensure Kepler was capable of supporting large-scale production runs for SciDAC applications. Our work on generic actors and templates have improved the portability of workflows across machines and provided a higher level of abstraction for workflow developers. Fault tolerance and provenance tracking were obvious areas for improvement within Kepler given the longevity and complexity of our target workflows. To monitor workflow execution, we developed and deployed a web-based dashboard. We then generalized this interface and released it so it could be deployed at other locations. Outreach has always been a primary focus of our work and we had many successful deployments across a number of scientific domains while continually publishing and presenting our work. This short paper describes our most significant accomplishments over the past 5 years. Additional information about the SDM Center can be found in the companion paper: The Scientific Data Management Center: Available Technologies and Highlights.

  10. Automation of Network-Based Scientific Workflows

    SciTech Connect

    Altintas, I.; Barreto, R.; Blondin, J. M.; Cheng, Z.; Critchlow, T.; Khan, A.; Klasky, Scott A; Ligon, J.; Ludaescher, B.; Mouallem, P. A.; Parker, S.; Podhorszki, Norbert; Shoshani, A.; Silva, C.; Vouk, M. A.

    2007-01-01

    Comprehensive, end-to-end, data and workflow management solutions are needed to handle the increasing complexity of processes and data volumes associated with modern distributed scientific problem solving, such as ultra-scale simulations and high-throughput experiments. The key to the solution is an integrated network-based framework that is functional, dependable, fault-tolerant, and supports data and process provenance. Such a framework needs to make development and use of application workflows dramatically easier so that scientists' efforts can shift away from data management and utility software development to scientific research and discovery An integrated view of these activities is provided by the notion of scientific workflows - a series of structured activities and computations that arise in scientific problem-solving. An information technology framework that supports scientific workflows is the Ptolemy II based environment called Kepler. This paper discusses the issues associated with practical automation of scientific processes and workflows and illustrates this with workflows developed using the Kepler framework and tools.

  11. Software and Workflow Provenance: Documenting Scientific Methods

    NASA Astrophysics Data System (ADS)

    Gil, Y.

    2016-12-01

    While major effort has been devoted to data provenance, the provenance of software and workflows is not as well understood. Software artifacts have their own provenance in terms of how they were created, what theories or models are behind them, and how they are derived from earlier versions or prior implementations. Workflows also evolve to reflect changes in science methods, as they are extended or assumptions lifted. These provenance records may be created separately from the provenance records of the articles themselves, but are needed to fully document the provenance of new scientific results in the published record. These are issues that the EarthCube OntoSoft project and the Geoscience Papers of the Future initiative are exploring. This presentation will discuss the requirements for capturing software and workflow provenance, the integration of those provenance records with provenance of new results in an article, and the integration of data provenance with software and workflow provenance. The presentation will also discuss how provenance records can support automated credit assignment, so that the explicit provenance records of the results in a paper can be automatically analyzed to extract credit to the software and workflows used to generate the results.

  12. Kepler Scientific Workflow Design and Execution with Contexts

    SciTech Connect

    Ngu, Anne Hee Hiong; Jamnagarwala, Arwa; Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

    2011-09-01

    A context-aware scientific workflow is a typical scientific workflow that is enhanced with context binding and awareness mechanisms. Context facilitates further configuration of the scientific workflow at runtime such that it is tuned to its environment during execution and responds intelligently based on such awareness without customized coding of the workflow. In this paper, we present a context annotation framework, which supports rapid development of context-aware scientific workflows. Context annotation enables a diverse type of actor in Kepler that may bind with different sensed environmental information as part of the actor’s regular data. Context-aware actors simplify the construction of scientific workflows that require intricate knowledge in initializing and configuring a large number of parameters to cover all different execution conditions. This paper presents the motivation, system design, implementation, and usage of context annotation in relation to the Kepler scientific workflow system.

  13. Scientific Workflows Composition and Deployment on SOA Frameworks

    SciTech Connect

    Liu, Yan; Gorton, Ian; Wynne, Adam S.; Kulkarni, Anand V.

    2011-12-12

    Scientific workflows normally consist of multiple applications acquiring and transforming data, running data intensive analyses and visualizing the results for scientific discovery. To compose and deploy such scientific workflows, an SOA platform can provide integration of third-party components, services, and tools. In this paper, we present our application of Service-Oriented Architecture (SOA) to compose and deploy systems biology workflows. In developing this application, our solution uses MeDICi a middleware framework built on SOA platforms as an integration layer. We discuss our experience and lessons learnt about this solution that are generally applicable to scientific workflows in other domains.

  14. Scientific workflows as productivity tools for drug discovery.

    PubMed

    Shon, John; Ohkawa, Hitomi; Hammer, Juergen

    2008-05-01

    Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.

  15. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

    DOE PAGES

    Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...

    2016-07-21

    The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less

  16. Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

    SciTech Connect

    Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; Ramakrishnan, Lavanya

    2016-07-21

    The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.

  17. Comparison of Resource Platform Selection Approaches for Scientific Workflows

    SciTech Connect

    Simmhan, Yogesh; Ramakrishnan, Lavanya

    2010-03-05

    Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. At the same time, it can eliminate the challenges of restricted software environments and queue delays in shared high performance computing environments. Choosing from these diverse resource platforms for a workflow execution poses a challenge for many scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows. While many workflow planning methods have explored task scheduling onto different resources, these methods often require fine-scale characterization of the workflow that is onerous for a scientist. In this position paper, we describe our early exploratory work into using blackbox characteristics to do a cost-benefit analysis across of using cloud platforms. We use only very limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism. The data size characterizes the IO requirements. We compare the effectiveness of this approach to other resource selection models using two exemplar scientific workflows scheduled on desktops, local clusters, HPC centers, and clouds. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of cloud computing resources even before porting an existing workflow.

  18. Distilling structure in Taverna scientific workflows: a refactoring approach

    PubMed Central

    2014-01-01

    Background Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. The complexity of such graph structures is increasing over time, with possible impacts on scientific workflows reuse. In this work, we propose effective methods for workflow design, with a focus on the Taverna model. We argue that one of the contributing factors for the difficulties in reuse is the presence of "anti-patterns", a term broadly used in program design, to indicate the use of idiomatic forms that lead to over-complicated design. The main contribution of this work is a method for automatically detecting such anti-patterns, and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity. Rewriting workflows in this way will be beneficial both in terms of user experience (easier design and maintenance), and in terms of operational efficiency (easier to manage, and sometimes to exploit the latent parallelism amongst the tasks). Results We have conducted a thorough study of the workflows structures available in Taverna, with the aim of finding out workflow fragments whose structure could be made simpler without altering the workflow semantics. We provide four contributions. Firstly, we identify a set of anti-patterns that contribute to the structural workflow complexity. Secondly, we design a series of refactoring transformations to replace each anti-pattern by a new semantically-equivalent pattern with less redundancy and simplified structure. Thirdly, we introduce a distilling algorithm that takes in a workflow and produces a distilled semantically-equivalent workflow. Lastly, we provide an implementation of our refactoring approach

  19. A Multi-Dimensional Classification Model for Scientific Workflow Characteristics

    SciTech Connect

    Ramakrishnan, Lavanya; Plale, Beth

    2010-04-05

    Workflows have been used to model repeatable tasks or operations in manufacturing, business process, and software. In recent years, workflows are increasingly used for orchestration of science discovery tasks that use distributed resources and web services environments through resource models such as grid and cloud computing. Workflows have disparate re uirements and constraints that affects how they might be managed in distributed environments. In this paper, we present a multi-dimensional classification model illustrated by workflow examples obtained through a survey of scientists from different domains including bioinformatics and biomedical, weather and ocean modeling, astronomy detailing their data and computational requirements. The survey results and classification model contribute to the high level understandingof scientific workflows.

  20. Kepler + MeDICi - Service-Oriented Scientific Workflow Applications

    SciTech Connect

    Chase, Jared M.; Gorton, Ian; Sivaramakrishnan, Chandrika; Almquist, Justin P.; Wynne, Adam S.; Chin, George; Critchlow, Terence J.

    2009-07-30

    Scientific applications are often structured as workflows that execute a series of interdependent, distributed software modules to analyze large data sets. The order of execution of the tasks in a workflow is commonly controlled by complex scripts, which over time become difficult to maintain and evolve. In this paper, we describe how we have integrated the Kepler scientific workflow platform with the MeDICi Integration Framework, which has been specifically designed to provide a standards-based, lightweight and flexible integration platform. The MeDICi technology provides a scalable, component-based architecture that efficiently handles integration with heterogeneous, distributed software systems. This paper describes the MeDICi Integration Framework and the mechanisms we used to integrate MeDICi components with Kepler workflow actors. We illustrate this solution with a workflow application for an atmospheric sciences application. The resulting solution promotes a strong separation of concerns, simplifying the Kepler workflow description and promoting the creation of a reusable collection of components available for other workflow applications in this domain.

  1. Enabling On-Demand Scientific Workflows on a Federated Cloud

    SciTech Connect

    Garzoglio, Gabriele

    2014-11-05

    The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center are working on a multi-year Collaborative Research and Development Agreement.With the knowledge developed in the first year on how to provision and manage a federation of virtual machines through Cloud management systems. In this second year, we expanded the work on provisioning and federation, increasing both scale and diversity of solutions, and we started to build on-demand services on the established fabric, introducing the paradigm of Platform as a Service to assist with the execution of scientific workflows. We have enabled scientific workflows of stakeholders to run on multiple cloud resources at the scale of 1,000 concurrent machines. The demonstrations have been in the areas of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federation of Cloud Resources, and (c) On-demand Services for ScientificWorkflows.

  2. A scientific workflow framework for (13)C metabolic flux analysis.

    PubMed

    Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

    2016-08-20

    Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases.

  3. Scientific Workflows + Provenance = Better (Meta-)Data Management

    NASA Astrophysics Data System (ADS)

    Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

    2013-12-01

    The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata

  4. Science Gateways, Scientific Workflows and Open Community Software

    NASA Astrophysics Data System (ADS)

    Pierce, M. E.; Marru, S.

    2014-12-01

    Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing

  5. Enabling scientific workflows in virtual reality

    USGS Publications Warehouse

    Kreylos, O.; Bawden, G.; Bernardin, T.; Billen, M.I.; Cowgill, E.S.; Gold, R.D.; Hamann, B.; Jadamec, M.; Kellogg, L.H.; Staadt, O.G.; Sumner, D.Y.

    2006-01-01

    To advance research and improve the scientific return on data collection and interpretation efforts in the geosciences, we have developed methods of interactive visualization, with a special focus on immersive virtual reality (VR) environments. Earth sciences employ a strongly visual approach to the measurement and analysis of geologic data due to the spatial and temporal scales over which such data ranges, As observations and simulations increase in size and complexity, the Earth sciences are challenged to manage and interpret increasing amounts of data. Reaping the full intellectual benefits of immersive VR requires us to tailor exploratory approaches to scientific problems. These applications build on the visualization method's strengths, using both 3D perception and interaction with data and models, to take advantage of the skills and training of the geological scientists exploring their data in the VR environment. This interactive approach has enabled us to develop a suite of tools that are adaptable to a range of problems in the geosciences and beyond. Copyright ?? 2008 by the Association for Computing Machinery, Inc.

  6. Web-accessible scientific workflow system for performance monitoring.

    PubMed

    Versteeg, Roelof J; Richardson, Alexander N; Rowe, Trevor

    2006-04-15

    We describe the design and implementation of a web-accessible scientific workflow system for environmental performance monitoring. This workflow environment integrates distributed automated data acquisition with server side data management and information visualization through flexible browser-based data access tools. Component technologies include a rich browser-based client, a back-end server for methodical data processing, user management, and result delivery, and third party applications which are invoked by the back-end using web services. This environment allows for reproducible, transparent result generation by a diverse user base, and provides a seamless integration between data selection, analysis applications, and result delivery. This workflow system has been implemented for several sites and monitoring systems with different degrees of complexity.

  7. The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)

    NASA Astrophysics Data System (ADS)

    Stephan, E.

    2010-12-01

    The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual

  8. Web-Accessible Scientific Workflow System for Performance Monitoring

    SciTech Connect

    Roelof Versteeg; Roelof Versteeg; Trevor Rowe

    2006-03-01

    We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoring systems with different degrees of complexity.

  9. Building Scientific Workflows for the Geosciences with Open Community Software

    NASA Astrophysics Data System (ADS)

    Pierce, M. E.; Marru, S.; Weerawarana, S. M.

    2012-12-01

    We describe the design and development of the Apache Airavata scientific workflow software and its application to problems in geosciences. Airavata is based on Service Oriented Architecture principles and is developed as general purpose software for managing large-scale science applications on supercomputing resources such as the NSF's XSEDE. Based on the NSF-funded EarthCube Workflow Working Group activities, we discuss the application of this software relative to specific requirements (such as data stream data processing, event triggering, dealing with large data sets, and advanced distributed execution patterns involved in data mining). We also consider the role of governance in EarthCube software development and present the development of Airavata software through the Apache Software Foundation's community development model. We discuss the potential impacts on software accountability and sustainability using this model.

  10. Scientific Workflows and the Sensor Web for Virtual Environmental Observatories

    NASA Astrophysics Data System (ADS)

    Simonis, I.; Vahed, A.

    2008-12-01

    interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.

  11. Facilitating Stewardship of scientific data through standards based workflows

    NASA Astrophysics Data System (ADS)

    Bastrakova, I.; Kemp, C.; Potter, A. K.

    2013-12-01

    scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.

  12. An Adaptable Seismic Data Format for Modern Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.

    2013-12-01

    Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.

  13. WRF4SG: A Scientific Gateway for climate experiment workflows

    NASA Astrophysics Data System (ADS)

    Blanco, Carlos; Cofino, Antonio S.; Fernandez-Quiruelas, Valvanuz

    2013-04-01

    The Weather Research and Forecasting model (WRF) is a community-driven and public domain model widely used by the weather and climate communities. As opposite to other application-oriented models, WRF provides a flexible and computationally-efficient framework which allows solving a variety of problems for different time-scales, from weather forecast to climate change projection. Furthermore, WRF is also widely used as a research tool in modeling physics, dynamics, and data assimilation by the research community. Climate experiment workflows based on Weather Research and Forecasting (WRF) are nowadays among the one of the most cutting-edge applications. These workflows are complex due to both large storage and the huge number of simulations executed. In order to manage that, we have developed a scientific gateway (SG) called WRF for Scientific Gateway (WRF4SG) based on WS-PGRADE/gUSE and WRF4G frameworks to ease achieve WRF users needs (see [1] and [2]). WRF4SG provides services for different use cases that describe the different interactions between WRF users and the WRF4SG interface in order to show how to run a climate experiment. As WS-PGRADE/gUSE uses portlets (see [1]) to interact with users, its portlets will support these use cases. A typical experiment to be carried on by a WRF user will consist on a high-resolution regional re-forecast. These re-forecasts are common experiments used as input data form wind power energy and natural hazards (wind and precipitation fields). In the cases below, the user is able to access to different resources such as Grid due to the fact that WRF needs a huge amount of computing resources in order to generate useful simulations: * Resource configuration and user authentication: The first step is to authenticate on users' Grid resources by virtual organizations. After login, the user is able to select which virtual organization is going to be used by the experiment. * Data assimilation: In order to assimilate the data sources

  14. On the support of scientific workflows over Pub/Sub brokers.

    PubMed

    Morales, Augusto; Robles, Tomas; Alcarria, Ramon; Cedeño, Edwin

    2013-08-20

    The execution of scientific workflows is gaining importance as more computing resources are available in the form of grid environments. The Publish/Subscribe paradigm offers well-proven solutions for sustaining distributed scenarios while maintaining the high level of task decoupling required by scientific workflows. In this paper, we propose a new model for supporting scientific workflows that improves the dissemination of control events. The proposed solution is based on the mapping of workflow tasks to the underlying Pub/Sub event layer, and the definition of interfaces and procedures for execution on brokers. In this paper we also analyze the strengths and weaknesses of current solutions that are based on existing message exchange models for scientific workflows. Finally, we explain how our model improves the information dissemination, event filtering, task decoupling and the monitoring of scientific workflows.

  15. On the Support of Scientific Workflows over Pub/Sub Brokers

    PubMed Central

    Morales, Augusto; Robles, Tomas; Alcarria, Ramon; Cedeño, Edwin

    2013-01-01

    The execution of scientific workflows is gaining importance as more computing resources are available in the form of grid environments. The Publish/Subscribe paradigm offers well-proven solutions for sustaining distributed scenarios while maintaining the high level of task decoupling required by scientific workflows. In this paper, we propose a new model for supporting scientific workflows that improves the dissemination of control events. The proposed solution is based on the mapping of workflow tasks to the underlying Pub/Sub event layer, and the definition of interfaces and procedures for execution on brokers. In this paper we also analyze the strengths and weaknesses of current solutions that are based on existing message exchange models for scientific workflows. Finally, we explain how our model improves the information dissemination, event filtering, task decoupling and the monitoring of scientific workflows. PMID:23966191

  16. Looking beneath the Edges and Nodes: Ranking and Mining Scientific Workflows

    ERIC Educational Resources Information Center

    Dong, Xiao

    2010-01-01

    Workflow technology has emerged as an eminent way to support scientific computing nowadays. Supported by mature technological infrastructures such as web services and high performance computing infrastructure, workflow technology has been well adopted by scientific community as it offers an effective framework to prototype, modify and manage…

  17. Looking beneath the Edges and Nodes: Ranking and Mining Scientific Workflows

    ERIC Educational Resources Information Center

    Dong, Xiao

    2010-01-01

    Workflow technology has emerged as an eminent way to support scientific computing nowadays. Supported by mature technological infrastructures such as web services and high performance computing infrastructure, workflow technology has been well adopted by scientific community as it offers an effective framework to prototype, modify and manage…

  18. Conceptual-level workflow modeling of scientific experiments using NMR as a case study

    PubMed Central

    Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R

    2007-01-01

    Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using

  19. Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

    PubMed

    Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R

    2007-01-30

    Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.

  20. Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report

    SciTech Connect

    Garzoglio, Gabriele

    2015-10-31

    The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.

  1. Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

    DOE PAGES

    Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...

    2015-01-01

    This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less

  2. Widening the adoption of workflows to include human and human-machine scientific processes

    NASA Astrophysics Data System (ADS)

    Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

    2010-12-01

    Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.

  3. Towards a scientific workflow methodology for primary care database studies.

    PubMed

    Curcin, Vasa; Bottle, Alex; Molokhia, Mariam; Millett, Christopher; Majeed, Azeem

    2010-08-01

    We describe the challenges of conducting studies based on mining large-scale primary care databases, namely data integration, data set definition, result reproducibility and reusability. These correspond to higher-level informatics challenges of automation, provenance capture and component integration. We provide a high-level view of the informatics infrastructure that addresses these challenges through a generic workflow-based e-Science middleware, and describe our experiences using the system to investigate differences in the health status of patients with diabetes before and after the national introduction of the UK GP contract in 2004.

  4. A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

    PubMed

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

    2014-01-01

    The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (p<0.05). It indicated that LPCC was more sensitive in detecting motion artifacts. MSEs of all derived parameters from the reserved data after the artifacts rejection were smaller than the variance of the noise. It suggested that influence of rejected artifacts was less than influence of noise on the precision of derived parameters. The proposed workflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (p<0.05). The proposed post-processing workflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post

  5. Services + Components = Data Intensive Scientific Workflow Applications with MeDICi

    SciTech Connect

    Gorton, Ian; Chase, Jared M.; Wynne, Adam S.; Almquist, Justin P.; Chappell, Alan R.

    2009-06-01

    Scientific applications are often structured as workflows that execute a series of distributed software modules to analyze large data sets. Such workflows are typically constructed using general-purpose scripting languages to coordinate the execution of the various modules and to exchange data sets between them. While such scripts provide a cost-effective approach for simple workflows, as the workflow structure becomes complex and evolves, the scripts quickly become complex and difficult to modify. This makes them a major barrier to easily and quickly deploying new algorithms and exploiting new, scalable hardware platforms. In this paper, we describe the MeDICi Workflow technology that is specifically designed to reduce the complexity of workflow application development, and to efficiently handle data intensive workflow applications. MeDICi integrates standard component-based and service-based technologies, and employs an efficient integration mechanism to ensure large data sets can be efficiently processed. We illustrate the use of MeDICi with a climate data processing example that we have built, and describe some of the new features

  6. Automating adjoint wave-equation travel-time tomography using scientific workflow

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaofeng; Chen, Po; Pullammanappallil, Satish

    2013-10-01

    Recent advances in commodity high-performance computing technology have dramatically reduced the computational cost for solving the seismic wave equation in complex earth structure models. As a consequence, wave-equation-based seismic tomography techniques are being actively developed and gradually adopted in routine subsurface seismic imaging practices. Wave-equation travel-time tomography is a seismic tomography technique that inverts cross-correlation travel-time misfits using full-wave Fréchet kernels computed by solving the wave equation. This technique can be implemented very efficiently using the adjoint method, in which the misfits are back-propagated from the receivers (i.e., seismometers) to produce the adjoint wave-field and the interaction between the adjoint wave-field and the forward wave-field from the seismic source gives the gradient of the objective function. Once the gradient is available, a gradient-based optimization algorithm can then be adopted to produce an optimal earth structure model that minimizes the objective function. This methodology is conceptually straightforward, but its implementation in practical situations is highly complex, error-prone and computationally demanding. In this study, we demonstrate the feasibility of automating wave-equation travel-time tomography based on the adjoint method using Kepler, an open-source software package for designing, managing and executing scientific workflows. The workflow technology allows us to abstract away much of the complexity involved in the implementation in a manner that is both robust and scalable. Our automated adjoint wave-equation travel-time tomography package has been successfully applied on a real active-source seismic dataset.

  7. An Integrated Framework for Parameter-based Optimization of Scientific Workflows

    PubMed Central

    Kumar, Vijay S.; Sadayappan, P.; Mehta, Gaurang; Vahi, Karan; Deelman, Ewa; Ratnakar, Varun; Kim, Jihie; Gil, Yolanda; Hall, Mary; Kurc, Tahsin; Saltz, Joel

    2011-01-01

    Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework. PMID:22068617

  8. Parameterized Specification, Configuration and Execution of Data-Intensive Scientific Workflows

    PubMed Central

    Kumar, Vijay S.; Kurc, Tahsin; Ratnakar, Varun; Kim, Jihie; Mehta, Gaurang; Vahi, Karan; Nelson, Yoonju Lee; Sadayappan, P.; Deelman, Ewa; Gil, Yolanda; Hall, Mary; Saltz, Joel

    2012-01-01

    Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework. PMID:22623878

  9. Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

    NASA Astrophysics Data System (ADS)

    Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

    2011-12-01

    Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the

  10. The Live Access Server Scientific Product Generation Through Workflow Orchestration

    NASA Astrophysics Data System (ADS)

    Hankin, S.; Calahan, J.; Li, J.; Manke, A.; O'Brien, K.; Schweitzer, R.

    2006-12-01

    The Live Access Server (LAS) is a well-established Web-application for display and analysis of geo-science data sets. The software, which can be downloaded and installed by anyone, gives data providers an easy way to establish services for their on-line data holdings, so their users can make plots; create and download data sub-sets; compare (difference) fields; and perform simple analyses. Now at version 7.0, LAS has been in operation since 1994. The current "Armstrong" release of LAS V7 consists of three components in a tiered architecture: user interface, workflow orchestration and Web Services. The LAS user interface (UI) communicates with the LAS Product Server via an XML protocol embedded in an HTTP "get" URL. Libraries (APIs) have been developed in Java, JavaScript and perl that can readily generate this URL. As a result of this flexibility it is common to find LAS user interfaces of radically different character, tailored to the nature of specific datasets or the mindset of specific users. When a request is received by the LAS Product Server (LPS -- the workflow orchestration component), business logic converts this request into a series of Web Service requests invoked via SOAP. These "back- end" Web services perform data access and generate products (visualizations, data subsets, analyses, etc.). LPS then packages these outputs into final products (typically HTML pages) via Jakarta Velocity templates for delivery to the end user. "Fine grained" data access is performed by back-end services that may utilize JDBC for data base access; the OPeNDAP "DAPPER" protocol; or (in principle) the OGC WFS protocol. Back-end visualization services are commonly legacy science applications wrapped in Java or Python (or perl) classes and deployed as Web Services accessible via SOAP. Ferret is the default visualization application used by LAS, though other applications such as Matlab, CDAT, and GrADS can also be used. Other back-end services may include generation of Google

  11. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    NASA Astrophysics Data System (ADS)

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-12-01

    The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. We have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC); the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In this paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.

  12. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    SciTech Connect

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-12-23

    The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.

  13. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    DOE PAGES

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; ...

    2015-12-23

    The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally,more » it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.« less

  14. Enhancing the Scientific Data Delivery, Workflow and Consumption

    NASA Astrophysics Data System (ADS)

    Shrestha, S. R.; Rosencrans, M.; Collow, T. W.; Ali, K.; Zimble, D. A.; Rose, B.

    2015-12-01

    To improve scientific data and products access, usability and interoperability, NOAA offices, like the Climate Prediction Center (CPC), exploring various geospatial solutions to serve their users. As NOAA scientists develop new solutions that drive the research and implementation to improve services, it is imperative that those research outcomes (data and products) can be consumed by customers and easily integrated into customer decision processes. As such, progress is being made to leverage an interoperable data platform wherein systems can integrate with each other to support the synthesis of Climate and Weather data. In this talk, we will share an ongoing use case at CPC, demonstrating how Esri technology is being implemented to improve scientific data access, manipulation, analysis, visualization and use.

  15. Integrating visualization and interaction research to improve scientific workflows.

    PubMed

    Keefe, Daniel F

    2010-01-01

    Scientific-visualization research is, nearly by necessity, interdisciplinary. In addition to their collaborators in application domains (for example, cell biology), researchers regularly build on close ties with disciplines related to visualization, such as graphics, human-computer interaction, and cognitive science. One of these ties is the connection between visualization and interaction research. This isn't a new direction for scientific visualization (see the "Early Connections" sidebar). However, momentum recently seems to be increasing toward integrating visualization research (for example, effective visual presentation of data) with interaction research (for example, innovative interactive techniques that facilitate manipulating and exploring data). We see evidence of this trend in several places, including the visualization literature and conferences.

  16. A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid.

    PubMed

    Tan, Wei; Missier, Paolo; Foster, Ian; Madduri, Ravi; Goble, Carole

    2010-06-25

    With the emergence of "service oriented science," the need arises to orchestrate multiple services to facilitate scientific investigation-that is, to create "science workflows." We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community.

  17. A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid

    PubMed Central

    Tan, Wei; Missier, Paolo; Foster, Ian; Madduri, Ravi; Goble, Carole

    2009-01-01

    With the emergence of “service oriented science,” the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create “science workflows.” We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community. PMID:20625534

  18. A virtual data language and system for scientific workflow management in data grid environments

    NASA Astrophysics Data System (ADS)

    Zhao, Yong

    With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.

  19. Integration and Commissioning of a Prototype Federated Cloud for Scientific Workflows

    SciTech Connect

    Garzoglio, Gabriele

    2013-01-01

    The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center propose a joint project. The goals are to enable scientific workflows of stakeholders to run on multiple cloud resources by use of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federat ion of Cloud Resources , and (c) High-Throughput Fabric Virtualization. This is a matching fund project in which Fermilab and KISTI will contribute equal resources .

  20. Chang'E-3 data pre-processing system based on scientific workflow

    NASA Astrophysics Data System (ADS)

    tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

    2016-04-01

    The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.

  1. Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)

    NASA Astrophysics Data System (ADS)

    Langenhorst, A. R.; Balaji, V.; Yakovlev, A.

    2010-12-01

    A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability

  2. Kepler WebView: A Lightweight, Portable Framework for Constructing Real-time Web Interfaces of Scientific Workflows.

    PubMed

    Crawl, Daniel; Singh, Alok; Altintas, Ilkay

    2016-01-01

    Modern web technologies facilitate the creation of high-quality data visualizations, and rich, interactive components across a wide variety of devices. Scientific workflow systems can greatly benefit from these technologies by giving scientists a better understanding of their data or model leading to new insights. While several projects have enabled web access to scientific workflow systems, they are primarily organized as a large portal server encapsulating the workflow engine. In this vision paper, we propose the design for Kepler WebView, a lightweight framework that integrates web technologies with the Kepler Scientific Workflow System. By embedding a web server in the Kepler process, Kepler WebView enables a wide variety of usage scenarios that would be difficult or impossible using the portal model.

  3. Kepler WebView: A Lightweight, Portable Framework for Constructing Real-time Web Interfaces of Scientific Workflows

    PubMed Central

    Crawl, Daniel; Singh, Alok; Altintas, Ilkay

    2017-01-01

    Modern web technologies facilitate the creation of high-quality data visualizations, and rich, interactive components across a wide variety of devices. Scientific workflow systems can greatly benefit from these technologies by giving scientists a better understanding of their data or model leading to new insights. While several projects have enabled web access to scientific workflow systems, they are primarily organized as a large portal server encapsulating the workflow engine. In this vision paper, we propose the design for Kepler WebView, a lightweight framework that integrates web technologies with the Kepler Scientific Workflow System. By embedding a web server in the Kepler process, Kepler WebView enables a wide variety of usage scenarios that would be difficult or impossible using the portal model. PMID:28232853

  4. A Scientific Workflow Platform for Generic and Scalable Object Recognition on Medical Images

    NASA Astrophysics Data System (ADS)

    Möller, Manuel; Tuot, Christopher; Sintek, Michael

    In the research project THESEUS MEDICO we aim at a system combining medical image information with semantic background knowledge from ontologies to give clinicians fully cross-modal access to biomedical image repositories. Therefore joint efforts have to be made in more than one dimension: Object detection processes have to be specified in which an abstraction is performed starting from low-level image features across landmark detection utilizing abstract domain knowledge up to high-level object recognition. We propose a system based on a client-server extension of the scientific workflow platform Kepler that assists the collaboration of medical experts and computer scientists during development and parameter learning.

  5. An open source workflow for 3D printouts of scientific data volumes

    NASA Astrophysics Data System (ADS)

    Loewe, P.; Klump, J. F.; Wickert, J.; Ludwig, M.; Frigeri, A.

    2013-12-01

    As the amount of scientific data continues to grow, researchers need new tools to help them visualize complex data. Immersive data-visualisations are helpful, yet fail to provide tactile feedback and sensory feedback on spatial orientation, as provided from tangible objects. The gap in sensory feedback from virtual objects leads to the development of tangible representations of geospatial information to solve real world problems. Examples are animated globes [1], interactive environments like tangible GIS [2], and on demand 3D prints. The production of a tangible representation of a scientific data set is one step in a line of scientific thinking, leading from the physical world into scientific reasoning and back: The process starts with a physical observation, or from a data stream generated by an environmental sensor. This data stream is turned into a geo-referenced data set. This data is turned into a volume representation which is converted into command sequences for the printing device, leading to the creation of a 3D printout. As a last, but crucial step, this new object has to be documented and linked to the associated metadata, and curated in long term repositories to preserve its scientific meaning and context. The workflow to produce tangible 3D data-prints from science data at the German Research Centre for Geosciences (GFZ) was implemented as a software based on the Free and Open Source Geoinformatics tools GRASS GIS and Paraview. The workflow was successfully validated in various application scenarios at GFZ using a RapMan printer to create 3D specimens of elevation models, geological underground models, ice penetrating radar soundings for planetology, and space time stacks for Tsunami model quality assessment. While these first pilot applications have demonstrated the feasibility of the overall approach [3], current research focuses on the provision of the workflow as Software as a Service (SAAS), thematic generalisation of information content and

  6. What Not To Do: Anti-patterns for Developing Scientific Workflow Software Components

    NASA Astrophysics Data System (ADS)

    Futrelle, J.; Maffei, A. R.; Sosik, H. M.; Gallager, S. M.; York, A.

    2013-12-01

    Scientific workflows promise to enable efficient scaling-up of researcher code to handle large datasets and workloads, as well as documentation of scientific processing via standardized provenance records, etc. Workflow systems and related frameworks for coordinating the execution of otherwise separate components are limited, however, in their ability to overcome software engineering design problems commonly encountered in pre-existing components, such as scripts developed externally by scientists in their laboratories. In practice, this often means that components must be rewritten or replaced in a time-consuming, expensive process. In the course of an extensive workflow development project involving large-scale oceanographic image processing, we have begun to identify and codify 'anti-patterns'--problematic design characteristics of software--that make components fit poorly into complex automated workflows. We have gone on to develop and document low-effort solutions and best practices that efficiently address the anti-patterns we have identified. The issues, solutions, and best practices can be used to evaluate and improve existing code, as well as guiding the development of new components. For example, we have identified a common anti-pattern we call 'batch-itis' in which a script fails and then cannot perform more work, even if that work is not precluded by the failure. The solution we have identified--removing unnecessary looping over independent units of work--is often easier to code than the anti-pattern, as it eliminates the need for complex control flow logic in the component. Other anti-patterns we have identified are similarly easy to identify and often easy to fix. We have drawn upon experience working with three science teams at Woods Hole Oceanographic Institution, each of which has designed novel imaging instruments and associated image analysis code. By developing use cases and prototypes within these teams, we have undertaken formal evaluations of

  7. Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

    NASA Astrophysics Data System (ADS)

    Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

    2017-04-01

    Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high

  8. A Six‐Stage Workflow for Robust Application of Systems Pharmacology

    PubMed Central

    Gadkar, K; Kirouac, DC; Mager, DE; van der Graaf, PH

    2016-01-01

    Quantitative and systems pharmacology (QSP) is increasingly being applied in pharmaceutical research and development. One factor critical to the ultimate success of QSP is the establishment of commonly accepted language, technical criteria, and workflows. We propose an integrated workflow that bridges conceptual objectives with underlying technical detail to support the execution, communication, and evaluation of QSP projects. PMID:27299936

  9. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-06-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  10. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    SciTech Connect

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-01-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  11. The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

    NASA Astrophysics Data System (ADS)

    Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

    2012-12-01

    The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models

  12. Nationwide Buildings Energy Research enabled through an integrated Data Intensive Scientific Workflow and Advanced Analysis Environment

    SciTech Connect

    Kleese van Dam, Kerstin; Lansing, Carina S.; Elsethagen, Todd O.; Hathaway, John E.; Guillen, Zoe C.; Dirks, James A.; Skorski, Daniel C.; Stephan, Eric G.; Gorrissen, Willy J.; Gorton, Ian; Liu, Yan

    2014-01-28

    Modern workflow systems enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve, due to the inherent resource requirements needed for the modeling work. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we will describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create and execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The integration of simulation and analysis environments is hereby not only a question of ease of use, but supports fundamental functions in the correlated analysis of simulation input, execution details and derived results for multi-variant, complex studies. To this end the team extended and integrated the existing capabilities of the Velo data management and analysis infrastructure, the MeDICi data intensive workflow system and RHIPE the R for Hadoop version of the well-known statistics package, as well as developing a new visual analytics interface for the result exploitation by multi-domain users. The capabilities of the new environment are demonstrated on a use case that focusses on the Pacific Northwest National Laboratory (PNNL) building energy team, showing how they were able to take their previously local scale simulations to a nationwide level by utilizing data intensive computing techniques not only for their modeling work, but also for the subsequent analysis of their modeling results. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3 year study of building energy demands for the US Eastern

  13. LiSIs: An Online Scientific Workflow System for Virtual Screening.

    PubMed

    Kannas, Christos C; Kalvari, Ioanna; Lambrinidis, George; Neophytou, Christiana M; Savva, Christiana G; Kirmitzoglou, Ioannis; Antoniou, Zinonas; Achilleos, Kleo G; Scherf, David; Pitta, Chara A; Nicolaou, Christos A; Mikros, Emanuel; Promponas, Vasilis J; Gerhauser, Clarissa; Mehta, Rajendra G; Constantinou, Andreas I; Pattichis, Constantinos S

    2015-01-01

    Modern methods of drug discovery and development in recent years make a wide use of computational algorithms. These methods utilise Virtual Screening (VS), which is the computational counterpart of experimental screening. In this manner the in silico models and tools initial replace the wet lab methods saving time and resources. This paper presents the overall design and implementation of a web based scientific workflow system for virtual screening called, the Life Sciences Informatics (LiSIs) platform. The LiSIs platform consists of the following layers: the input layer covering the data file input; the pre-processing layer covering the descriptors calculation, and the docking preparation components; the processing layer covering the attribute filtering, compound similarity, substructure matching, docking prediction, predictive modelling and molecular clustering; post-processing layer covering the output reformatting and binary file merging components; output layer covering the storage component. The potential of LiSIs platform has been demonstrated through two case studies designed to illustrate the preparation of tools for the identification of promising chemical structures. The first case study involved the development of a Quantitative Structure Activity Relationship (QSAR) model on a literature dataset while the second case study implemented a docking-based virtual screening experiment. Our results show that VS workflows utilizing docking, predictive models and other in silico tools as implemented in the LiSIs platform can identify compounds in line with expert expectations. We anticipate that the deployment of LiSIs, as currently implemented and available for use, can enable drug discovery researchers to more easily use state of the art computational techniques in their search for promising chemical compounds. The LiSIs platform is freely accessible (i) under the GRANATUM platform at: http://www.granatum.org and (ii) directly at: http://lisis.cs.ucy.ac.cy.

  14. A web accessible scientific workflow system for vadoze zone performance monitoring: design and implementation examples

    NASA Astrophysics Data System (ADS)

    Mattson, E.; Versteeg, R.; Ankeny, M.; Stormberg, G.

    2005-12-01

    Long term performance monitoring has been identified by DOE, DOD and EPA as one of the most challenging and costly elements of contaminated site remedial efforts. Such monitoring should provide timely and actionable information relevant to a multitude of stakeholder needs. This information should be obtained in a manner which is auditable, cost effective and transparent. Over the last several years INL staff has designed and implemented a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition from diverse sensors (geophysical, geochemical and hydrological) with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic javascript and html/css) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This system has been implemented and is operational for several sites, including the Ruby Gulch Waste Rock Repository (a capped mine waste rock dump on the Gilt Edge Mine Superfund Site), the INL Vadoze Zone Research Park and an alternative cover landfill. Implementations for other vadoze zone sites are currently in progress. These systems allow for autonomous performance monitoring through automated data analysis and report generation. This performance monitoring has allowed users to obtain insights into system dynamics, regulatory compliance and residence times of water. Our system uses modular components for data selection and graphing and WSDL compliant webservices for external functions such as statistical analyses and model invocations. Thus, implementing this system for novel sites and extending functionality (e.g. adding novel models) is relatively straightforward. As system access requires a standard webbrowser

  15. A Practitioner Friendly and Scientifically Robust Training Evaluation Approach

    ERIC Educational Resources Information Center

    Griffin, Richard

    2012-01-01

    Purpose: This article seeks to review the current state of workplace learning evaluation, to set out the rationale for evaluation along with the barriers that practitioners face when seeking to assess the effectiveness of training and development. Finally, it aims to propose a scientifically robust and practitioner friendly approach to evaluation.…

  16. A Practitioner Friendly and Scientifically Robust Training Evaluation Approach

    ERIC Educational Resources Information Center

    Griffin, Richard

    2012-01-01

    Purpose: This article seeks to review the current state of workplace learning evaluation, to set out the rationale for evaluation along with the barriers that practitioners face when seeking to assess the effectiveness of training and development. Finally, it aims to propose a scientifically robust and practitioner friendly approach to evaluation.…

  17. The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

    PubMed

    Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

    2016-06-01

    This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.

  18. Real-Time Field Data Acquisition and Remote Sensor Reconfiguration Using Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Silva, F.; Mehta, G.; Vahi, K.; Deelman, E.

    2010-12-01

    Despite many technological advances, field data acquisition still consists of several manual and laborious steps. Once sensors and data loggers are deployed in the field, scientists often have to periodically return to their study sites in order to collect their data. Even when field deployments have a way to communicate and transmit data back to the laboratory (e.g. by using a satellite or a cellular modem), data analysis still requires several repetitive steps. Because data often needs to be processed and inspected manually, there is usually a significant time delay between data collection and analysis. As a result, sensor failures that could be detected almost in real-time are not noted for weeks or months. Finally, sensor reconfiguration as a result of interesting events in the field is still done manually, making rapid response nearly impossible and causing important data to be missed. By working closely with scientists from different application domains, we identified several tasks that, if automated, could greatly improve the way field data is collected, processed, and distributed. Our goals are to enable real-time data collection and validation, automate sensor reconfiguration in response to interest events in the field, and allow scientists to easily automate their data processing. We began our design by employing the Sensor Processing and Acquisition Network (SPAN) architecture. SPAN uses an embedded processor in the field to coordinate sensor data acquisition from analog and digital sensors by interfacing with different types of devices and data loggers. SPAN is also able to interact with various types of communication devices in order to provide real-time communication to and from field sites. We use the Pegasus Workflow Management System (Pegasus WMS) to coordinate data collection and control sensors and deployments in the field. Because scientific workflows can be used to automate multi-step, repetitive tasks, scientists can create simple workflows to

  19. A framework for integration of scientific applications into the OpenTopography workflow

    NASA Astrophysics Data System (ADS)

    Nandigam, V.; Crosby, C.; Baru, C.

    2012-12-01

    The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by

  20. ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows

    NASA Technical Reports Server (NTRS)

    McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush

    2004-01-01

    With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.

  1. An open source approach to enable the reproducibility of scientific workflows in the ocean sciences

    NASA Astrophysics Data System (ADS)

    Di Stefano, M.; Fox, P. A.; West, P.; Hare, J. A.; Maffei, A. R.

    2013-12-01

    Every scientist should be able to rerun data analyses conducted by his or her team and regenerate the figures in a paper. However, all too often the correct version of a script goes missing, or the original raw data is filtered by hand and the filtering process is undocumented, or there is lack of collaboration and communication among scientists working in a team. Here we present 3 different use cases in ocean sciences in which end-to-end workflows are tracked. The main tool that is deployed to address these use cases is based on a web application (IPython Notebook) that provides the ability to work on very diverse and heterogeneous data and information sources, providing an effective way to share the and track changes to source code used to generate data products and associated metadata, as well as to track the overall workflow provenance to allow versioned reproducibility of a data product. Use cases selected for this work are: 1) A partial reproduction of the Ecosystem Status Report (ESR) for the Northeast U.S. Continental Shelf Large Marine Ecosystem. Our goal with this use case is to enable not just the traceability but also the reproducibility of this biannual report, keeping track of all the processes behind the generation and validation of time-series and spatial data and information products. An end-to-end workflow with code versioning is developed so that indicators in the report may be traced back to the source datasets. 2) Realtime generation of web pages to be able to visualize one of the environmental indicators from the Ecosystem Advisory for the Northeast Shelf Large Marine Ecosystem web site. 3) Data and visualization integration for ocean climate forecasting. In this use case, we focus on a workflow to describe how to provide access to online data sources in the NetCDF format and other model data, and make use of multicore processing to generate video animation from time series of gridded data. For each use case we show how complete workflows

  2. Facilitating Scientific Research through Workflows and Provenance on the DataONE Cyberinfrastructure (Invited)

    NASA Astrophysics Data System (ADS)

    Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

    2013-12-01

    Provenance data has numerous applications in science. Two key ones are 1) replication: facilitate the repeatable derivation of results and 2) discovery: enable the location of data based on processing history and derivation relationships. The following scenario illustrates a typical use of provenance data. Alice, a climate scientist, has developed a VisTrails workflow to prepare Gross Primary Productivity (GPP) data. After verifying that the workflow generates data in the desired form, she uses the ReproZip tool to create a reproducible package that will enable other scientists to re-run the workflow without having to install and configure the particular libraries she is using. In addition, she exports the provenance information of the workflow execution and customizes it through a tool such as the ProvExplorer, in order to eliminate the information she regards as superfluous. She then creates and shares a DataONE data package containing the data she prepared, the ReproZip package, the customized provenance, and additional science/system metadata. Both the customized provenance and metadata are indexed by the DataONE Cyberinfrastructure (CI) for discovery purposes. Bob, another climate scientist, is looking for a benchmark GPP data to validate the Terrestrial Biosphere Model (TBM) he has developed. Searching the DataONE repository he finds Alice's data package. He retrieves its ReproZip package, customizes it (e.g. changing the spatial resolution), and re-runs it to generate the benchmark data in the form he desires. The newly generated data is then used as input for his own model evaluation workflow. His workflow generates residual maps and a Taylor diagram that enable him to evaluate the similarity between the results of his model and the benchmark data. At this point, Bob can also make use of the tools Alice used to publish his results as another discoverable and reproducible data package. In order to support these capabilities, we propose to extend the Data

  3. Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

    SciTech Connect

    Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi; Schram, Malachi; Macduff, Matt C.; Kerbyson, Darren J.; Kleese-Van Dam, Kerstin; Singh, Alok; Altintas, Ilkay

    2016-11-21

    As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives.

  4. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    NASA Astrophysics Data System (ADS)

    Friedrich, Jon M.

    2013-05-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish several scientific objectives. Students take part in a classroom-based, distributed workflow project that aims to produce high-quality data on the physical dimensions of chondrules, mm-sized spherules contained in primitive meteorites called chondrites. Such data are needed to test astrophysical models for processes acting in the early solar system. Student investigators process X-ray microtomography data with resources contained on portable USB flash drives distributed to them. Students are exposed to data collection, data quality evaluation, interpretation, and presentation of their results. Herein, an introduction to the scientific objectives is given along with an evolutionary history of the project. A description of the current implementation of the course is presented, and future directions are discussed. Anonymous student evaluations of the course are used to demonstrate the educational and engaging nature of the project. Finally, we reflect on the possible benefits of such a project for first- and second-year students within STEM disciplines.

  5. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

    PubMed Central

    Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results

  6. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

    PubMed

    Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio

    2017-01-01

    There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results

  7. Construction of antimicrobial peptide-drug combination networks from scientific literature based on a semi-automated curation workflow.

    PubMed

    Jorge, Paula; Pérez-Pérez, Martín; Pérez Rodríguez, Gael; Fdez-Riverola, Florentino; Pereira, Maria Olívia; Lourenço, Anália

    2016-01-01

    Considerable research efforts are being invested in the development of novel antimicrobial therapies effective against the growing number of multi-drug resistant pathogens. Notably, the combination of different agents is increasingly explored as means to exploit and improve individual agent actions while minimizing microorganism resistance. Although there are several databases on antimicrobial agents, scientific literature is the primary source of information on experimental antimicrobial combination testing. This work presents a semi-automated database curation workflow that supports the mining of scientific literature and enables the reconstruction of recently documented antimicrobial combinations. Currently, the database contains data on antimicrobial combinations that have been experimentally tested against Pseudomonas aeruginosa, Staphylococcus aureus, Escherichia coli, Listeria monocytogenes and Candida albicans, which are prominent pathogenic organisms and are well-known for their wide and growing resistance to conventional antimicrobials. Researchers are able to explore the experimental results for a single organism or across organisms. Likewise, researchers may look into indirect network associations and identify new potential combinations to be tested. The database is available without charges.Database URL: http://sing.ei.uvigo.es/antimicrobialCombination/. © The Author(s) 2016. Published by Oxford University Press.

  8. Construction of antimicrobial peptide-drug combination networks from scientific literature based on a semi-automated curation workflow

    PubMed Central

    Jorge, Paula; Pérez-Pérez, Martín; Pérez Rodríguez, Gael; Fdez-Riverola, Florentino; Pereira, Maria Olívia; Lourenço, Anália

    2016-01-01

    Considerable research efforts are being invested in the development of novel antimicrobial therapies effective against the growing number of multi-drug resistant pathogens. Notably, the combination of different agents is increasingly explored as means to exploit and improve individual agent actions while minimizing microorganism resistance. Although there are several databases on antimicrobial agents, scientific literature is the primary source of information on experimental antimicrobial combination testing. This work presents a semi-automated database curation workflow that supports the mining of scientific literature and enables the reconstruction of recently documented antimicrobial combinations. Currently, the database contains data on antimicrobial combinations that have been experimentally tested against Pseudomonas aeruginosa, Staphylococcus aureus, Escherichia coli, Listeria monocytogenes and Candida albicans, which are prominent pathogenic organisms and are well-known for their wide and growing resistance to conventional antimicrobials. Researchers are able to explore the experimental results for a single organism or across organisms. Likewise, researchers may look into indirect network associations and identify new potential combinations to be tested. The database is available without charges. Database URL: http://sing.ei.uvigo.es/antimicrobialCombination/ PMID:28025336

  9. Jupyter meets Earth: Creating Comprehensible and Reproducible Scientific Workflows with Jupyter Notebooks and Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Erickson, T.

    2016-12-01

    Deriving actionable information from Earth observation data obtained from sensors or models can be quite complicated, and sharing those insights with others in a form that they can understand, reproduce, and improve upon is equally difficult. Journal articles, even if digital, commonly present just a summary of an analysis that cannot be understood in depth or reproduced without major effort on the part of the reader. Here we show a method of improving scientific literacy by pairing a recently developed scientific presentation technology (Jupyter Notebooks) with a petabyte-scale platform for accessing and analyzing Earth observation and model data (Google Earth Engine). Jupyter Notebooks are interactive web documents that mix live code with annotations such as rich-text markup, equations, images, videos, hyperlinks and dynamic output. Notebooks were first introduced as part of the IPython project in 2011, and have since gained wide acceptance in the scientific programming community, initially among Python programmers but later by a wide range of scientific programming languages. While Jupyter Notebooks have been widely adopted for general data analysis, data visualization, and machine learning, to date there have been relatively few examples of using Jupyter Notebooks to analyze geospatial datasets. Google Earth Engine is cloud-based platform for analyzing geospatial data, such as satellite remote sensing imagery and/or Earth system model output. Through its Python API, Earth Engine makes petabytes of Earth observation data accessible, and provides hundreds of algorithmic building blocks that can be chained together to produce high-level algorithms and outputs in real-time. We anticipate that this technology pairing will facilitate a better way of creating, documenting, and sharing complex analyses that derive information on our Earth that can be used to promote broader understanding of the complex issues that it faces. http://jupyter.orghttps://earthengine.google.com

  10. The application of cloud computing to scientific workflows: a study of cost and performance.

    PubMed

    Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S

    2013-01-28

    The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.

  11. A web accessible scientific workflow system for transparent and reproducible generation of information on subsurface processes from autonomously sensed data

    NASA Astrophysics Data System (ADS)

    Versteeg, R.; Richardson, A.; Thomas, S.; Lu, B.; Neto, J.; Wheeler, M.; Rowe, T.; Parashar, M.; Ankeny, M.

    2005-12-01

    Information on subsurface processes is required for a broad range of applications, including site remediation, groundwater management, fossil fuel production and CO2 sequestration. Data on these processes is obtained from diverse sensor networks, includes physical, hydrological and chemical sensors and semi permanent geophysical sensors (mainly seismic and resistivity). Currently, processing is done by specialists through the use of commercial and research software packages such as numerical inverse and forward models, statistical data analysis software and visualization and data presentation packages. Information is presented to stakeholders as tables, images and reports. Processing steps, data and assumptions used for information generation are mostly opaque to endusers. As data migrates between applications the steps taken in each application (e.g. in data reduction)are often only partly documented, resulting in irreproducible results. In this approach, interactive tuning of data processing in a systematic way (e.g. changing model parameters, visualization parameters or data used) or using data processing as a discovery tool is de facto impossible. We implemented a web accessible scientific workflow system for subsurface performance monitoring. This system integrates distributed, automated data acquisition from autonomous sensor networks with server side data management and information visualization through flexible browser based data access tools. Webservices are used for communication with the sensor networks and interaction with applications. This system was originally developed for a monitoring network at the Gilt Edge Mine Superfund site, but has now been implemented for a range of different sensor networks of different complexity. The workflow framework allows for rapid and easy integration in a modular, transparent and reproducible manner of a multitude of existing applications for data analysis and processes. By embedding applications in webservice

  12. Scientific workflow and support for high resolution global climate modeling at the Oak Ridge Leadership Computing Facility

    NASA Astrophysics Data System (ADS)

    Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.

    2012-04-01

    The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on

  13. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles

  14. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  15. A Scientific Workflow Used as a Computational Tool to Assess the Response of the Californian San Joaquin River to Flow Restoration Efforts

    NASA Astrophysics Data System (ADS)

    Villamizar, S. R.; Gil, Y.; Szekely, P.; Ratnakar, V.; Gupta, S.; Muslea, M.; Silva, F.; Harmon, T.

    2011-12-01

    The San Joaquin River (SJR) restoration effort began in October 2009 with the onset of federally mandated continuous flow. A key objective of the effort is to restore and maintain fish populations in the main stem of the San Joaquin River, from below the Friant Dam to the confluence of the Merced River. In addition to the renewed flows, the restoration effort has brought about several upgraded and new water quality monitoring stations equipped with dissolved oxygen (DO) and temperature sensors. As the SJR response to the restoration efforts will be dictated by a complex combination of hydrodynamic and biogeochemical processes, we propose monitoring whole-stream metabolism as an integrative ecological indicator. Here, we develop and test a near-real time scientific workflow to facilitate the observation of the spatio-temporal distribution of whole-stream metabolism estimates using available monitoring station flow and water quality data. The scientific objective is to identify correlations between whole-stream metabolism estimates and the seasonally variable flow and flow disturbances (e.g., flood-control releases), which are the primary driver of stream ecosystems. To accomplish this requires overcoming technical challenges in terms of both data collection and data analysis because (1) the information required for this multi-site, long-term study, originates from different sources with the implication of different associated properties (data integrity, sampling intervals, units), and (2) the variability of the interim flows requires adaptive model selection within the framework of the metabolism calculations. These challenges are addressed by using a scientific workflow in which semantic metadata is generated as the data is prepared and then subsequently used to select and configure models, effectively customizing them to the current data. Data preparation involves the extraction, cleaning, normalization and integration of the data coming from sensors and third

  16. Toward a robust workflow for deep crustal imaging by FWI of OBS data: The eastern Nankai Trough revisited

    NASA Astrophysics Data System (ADS)

    Górszczyk, Andrzej; Operto, Stéphane; Malinowski, Michał

    2017-06-01

    Crustal-scale imaging by the full-waveform inversion (FWI) of long-offset seismic data is inherently difficult because the large number of wavelengths propagating through the crust makes the inversion prone to cycle skipping. Therefore, efficient crustal-scale FWI requires an accurate starting model and a stable workflow minimizing the nonlinearity of the inversion. Here we attempt to reprocess a challenging 2-D ocean-bottom seismometer (OBS) data set from the eastern Nankai Trough. The starting model is built by first-arrival traveltime tomography (FAT), which is FWI assisted for tracking cycle skipping. We iteratively refine the picked traveltimes and then reiterate the FAT until the traveltime residuals remain below the cycle-skipping limit. Subsequently, we apply Laplace-Fourier FWI, in which progressive relaxation of time damping is nested within frequency continuation to hierarchically inject more data into the inversion. These two multiscale levels are complemented by a layer-stripping approach implemented through offset continuation. The reliability of the FWI velocity model is assessed by means of source wavelet estimation, synthetic seismogram modeling, ray tracing modeling, dynamic warping, and checkerboard tests. Although the viscoacoustic approximation is used for wave modeling, the synthetic seismograms reproduce most of the complexity of the data with a high traveltime accuracy. The revised FWI scheme produces a high-resolution velocity model of the entire crust that can be jointly interpreted with migrated images derived from multichannel seismic data. This study opens a new perspective on the design of OBS crustal-scale experiments amenable to FWI; however, a further assessment of the optimal OBS spacing is required for reliable FWI.

  17. Workflow for the integration of a realistic 3D geomodel in process simulations using different cell types and advanced scientific visualization: Variations on a synthetic salt diapir

    NASA Astrophysics Data System (ADS)

    Görz, Ines; Herbst, Martin; Börner, Jana H.; Zehner, Björn

    2017-03-01

    The purpose of this study is to use one complex geological 3D model for numerical simulations of various physical processes in process-specific simulation software. To do this, the 3D model has to be discretized according to different cell types, depending on the requirements of the simulation method. We used a salt structure with a diapir and its deformed host rock to produce two 3D models describing the boundary surfaces of the structure: one very simplified model consisting of cuboid surfaces and a realistic model consisting of irregular boundary surfaces. We provide a workflow for how to generate hexahedral, tetrahedral and spherical volume representations of these two geometries. We utilized the volume representations to simulate temperature, displacement and transient electromagnetic fields. We can show that the simulation results closely reflect the input geometry and that it is worth the effort to produce geometric models that are as realistic as possible. Additionally, we provide a workflow for simultaneous visualization and analysis of the simulation results. Scientific visualization is an important tool for deriving knowledge from complex investigations.

  18. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows

    PubMed Central

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P.; Zijdenbos, Alex P.; Evans, Alan C.

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources. PMID:22493575

  19. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

    PubMed

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.

  20. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    ERIC Educational Resources Information Center

    Friedrich, Jon M.

    2014-01-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…

  1. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    ERIC Educational Resources Information Center

    Friedrich, Jon M.

    2014-01-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…

  2. Robustness

    NASA Technical Reports Server (NTRS)

    Ryan, R.

    1993-01-01

    Robustness is a buzz word common to all newly proposed space systems design as well as many new commercial products. The image that one conjures up when the word appears is a 'Paul Bunyon' (lumberjack design), strong and hearty; healthy with margins in all aspects of the design. In actuality, robustness is much broader in scope than margins, including such factors as simplicity, redundancy, desensitization to parameter variations, control of parameter variations (environments flucation), and operational approaches. These must be traded with concepts, materials, and fabrication approaches against the criteria of performance, cost, and reliability. This includes manufacturing, assembly, processing, checkout, and operations. The design engineer or project chief is faced with finding ways and means to inculcate robustness into an operational design. First, however, be sure he understands the definition and goals of robustness. This paper will deal with these issues as well as the need for the requirement for robustness.

  3. Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

    SciTech Connect

    Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.; Schuchardt, Karen L.; Ngu, Anne Hee Hiong

    2011-07-04

    A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adapted by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.

  4. Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

    PubMed

    Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.

  5. Creating Bioinformatic Workflows within the BioExtract Server

    USDA-ARS?s Scientific Manuscript database

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows generally require access to multiple, distributed data sources and analytic tools. The requisite data sources may include large public data repositories, community...

  6. Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

    PubMed

    Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

    2016-01-01

    In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.

  7. Semantic Workflows and Provenance

    NASA Astrophysics Data System (ADS)

    Gil, Y.

    2011-12-01

    While sharing and disseminating data is widely practiced across scientific communities, we have yet to recognize the importance of sharing and disseminating the analytic processes that leads to published data. Data retrieved from shared repositories and archives is often hard to interpret because we lack documentation about those processes: what models were used, what assumptions were made, what calibrations were carried out, etc. This process documentation is also key to aggregate data in a meaningful way, whether aggregating shared third party data or aggregating shared data with local sensor data collected by individual investigators. We suggest that augmenting published data with process documentation would greatly enhance our ability to find, reuse, interpret, and aggregate data and therefore have a significant impact in the utility of data repositories and archives. We will show that semantic workflows and provenance provide key technologies for capturing process documentation. Semantic workflows describe the kinds of data transformation and analysis steps used to create new data products, and can include useful constraints about why specific models were selected or parameters chosen. Provenance records can be used to publish workflow descriptions in standard formats that can be reused to enable verification and reproducibility of data products.

  8. Implementing bioinformatic workflows within the bioextract server

    USDA-ARS?s Scientific Manuscript database

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  9. VO-compliant workflows and science gateways

    NASA Astrophysics Data System (ADS)

    Castelli, G.; Taffoni, G.; Sciacca, E.; Becciani, U.; Costa, A.; Krokos, M.; Pasian, F.; Vuerli, C.

    2015-06-01

    Workflow and science gateway technologies have been adopted by scientific communities as a valuable tool to carry out complex experiments. They offer the possibility to perform computations for data analysis and simulations, whereas hiding details of the complex infrastructures underneath. There are many workflow management systems covering a large variety of generic services coordinating execution of workflows. In this paper we describe our experiences in creating workflows oriented science gateways based on gUSE/WS-PGRADE technology and in particular we discuss the efforts devoted to develop a VO-compliant web environment.

  10. A framework for streamlining research workflow in neuroscience and psychology

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691

  11. A framework for streamlining research workflow in neuroscience and psychology.

    PubMed

    Kubilius, Jonas

    2013-01-01

    Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers.

  12. Lattice QCD workflows

    SciTech Connect

    Piccoli, Luciano; Kowalkowski, James B.; Simone, James N.; Sun, Xian-He; Jin, Hui; Holmgren, Donald J.; Seenu, Nirmal; Singh, Amitoj G.; /Fermilab

    2008-12-01

    This paper discusses the application of existing workflow management systems to a real world science application (LQCD). Typical workflows and execution environment used in production are described. Requirements for the LQCD production system are discussed. The workflow management systems Askalon and Swift were tested by implementing the LQCD workflows and evaluated against the requirements. We report our findings and future work.

  13. SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

    NASA Astrophysics Data System (ADS)

    Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

    2014-05-01

    Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely

  14. RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

    PubMed Central

    Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

    2016-01-01

    As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971

  15. RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS.

    PubMed

    Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

    2016-01-01

    As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.

  16. CROSS: an efficient workflow for reaction-driven rescaffolding and side-chain optimization using robust chemical reactions and available reagents.

    PubMed

    Evers, Andreas; Hessler, Gerhard; Wang, Li-hsing; Werrel, Simon; Monecke, Peter; Matter, Hans

    2013-06-13

    A novel procedure (CROSS: Computational Rescaffolding and Optimization using Synthetic Schemes) for in silico rescaffolding and side-chain optimization is reported with explicit consideration of the route of synthesis and availability of compatible chemical reagents. We have defined a set of retrosynthetic disconnections representing robust reactions, amenable for combinatorial chemistry. This rule set is used to generate virtual fragment databases from available reagents. Each reactive center is annotated with its compatibility with regard to the chemical reactions. The rule set is then applied to a new molecule to obtain separate query subunits for rescaffolding by 3D similarity searching in specific reagent-derived fragment databases. Thus, only fragments compatible with the chemistry and shape of the corresponding query moiety are investigated further. The identified fragment hits directly indicate (1) available chemical reagents that can replace the query moiety in the starting molecule and (2) the route for the synthesis of the proposed molecules.

  17. Workflow automation architecture standard

    SciTech Connect

    Moshofsky, R.P.; Rohen, W.T.

    1994-11-14

    This document presents an architectural standard for application of workflow automation technology. The standard includes a functional architecture, process for developing an automated workflow system for a work group, functional and collateral specifications for workflow automation, and results of a proof of concept prototype.

  18. Metaworkflows and Workflow Interoperability for Heliophysics

    NASA Astrophysics Data System (ADS)

    Pierantoni, Gabriele; Carley, Eoin P.

    2014-06-01

    Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They

  19. Implementing bioinformatic workflows within the bioextract server.

    PubMed

    Lushbough, Carol M; Bergman, Michael K; Lawrence, Carolyn J; Jennewein, Doug; Brendel, Volker

    2008-01-01

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed service designed to provide researchers with the web ability to query multiple data sources, save results as searchable data sets, and execute analytic tools. As the researcher works with the system, their tasks are saved in the background. At any time these steps can be saved as a workflow that can then be executed again and/or modified later.

  20. Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012

    SciTech Connect

    Ludascher, Bertram; Altintas, Ilkay

    2013-09-06

    Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.

  1. Dynamic reusable workflows for ocean science

    USGS Publications Warehouse

    Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

    2016-01-01

    Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic

  2. Flexible workflow sharing and execution services for e-scientists

    NASA Astrophysics Data System (ADS)

    Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

    2013-04-01

    The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already

  3. Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study

    PubMed Central

    Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.

    2015-01-01

    assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses. PMID:26485157

  4. Developing a workflow to identify inconsistencies in volunteered geographic information: a phenological case study

    USGS Publications Warehouse

    Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.

    2015-01-01

    assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.

  5. Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study.

    PubMed

    Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L; Weltzin, Jake F

    2015-01-01

    assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.

  6. BReW: Blackbox Resource Selection for e-Science Workflows

    SciTech Connect

    Simmhan, Yogesh; Soroush, Emad; Van Ingen, Catharine; Agarwal, Deb; Ramakrishnan, Lavanya

    2010-10-04

    Workflows are commonly used to model data intensive scientific analysis. As computational resource needs increase for eScience, emerging platforms like clouds present additional resource choices for scientists and policy makers. We introduce BReW, a tool enables users to make rapid, highlevel platform selection for their workflows using limited workflow knowledge. This helps make informed decisions on whether to port a workflow to a new platform. Our analysis of synthetic and real eScience workflows shows that using just total runtime length, maximum task fanout, and total data used and produced by the workflow, BReW can provide platform predictions comparable to whitebox models with detailed workflow knowledge.

  7. Deployment of precise and robust sensors on board ISS-for scientific experiments and for operation of the station.

    PubMed

    Stenzel, Christian

    2016-09-01

    The International Space Station (ISS) is the largest technical vehicle ever built by mankind. It provides a living area for six astronauts and also represents a laboratory in which scientific experiments are conducted in an extraordinary environment. The deployed sensor technology contributes significantly to the operational and scientific success of the station. The sensors on board the ISS can be thereby classified into two categories which differ significantly in their key features: (1) sensors related to crew and station health, and (2) sensors to provide specific measurements in research facilities. The operation of the station requires robust, long-term stable and reliable sensors, since they assure the survival of the astronauts and the intactness of the station. Recently, a wireless sensor network for measuring environmental parameters like temperature, pressure, and humidity was established and its function could be successfully verified over several months. Such a network enhances the operational reliability and stability for monitoring these critical parameters compared to single sensors. The sensors which are implemented into the research facilities have to fulfil other objectives. The high performance of the scientific experiments that are conducted in different research facilities on-board demands the perfect embedding of the sensor in the respective instrumental setup which forms the complete measurement chain. It is shown that the performance of the single sensor alone does not determine the success of the measurement task; moreover, the synergy between different sensors and actuators as well as appropriate sample taking, followed by an appropriate sample preparation play an essential role. The application in a space environment adds additional challenges to the sensor technology, for example the necessity for miniaturisation, automation, reliability, and long-term operation. An alternative is the repetitive calibration of the sensors. This approach

  8. Benchmarking ETL Workflows

    NASA Astrophysics Data System (ADS)

    Simitsis, Alkis; Vassiliadis, Panos; Dayal, Umeshwar; Karagiannis, Anastasios; Tziovara, Vasiliki

    Extraction-Transform-Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. A plethora of ETL tools is currently available constituting a multi-million dollar market. Each ETL tool uses its own technique for the design and implementation of an ETL workflow, making the task of assessing ETL tools extremely difficult. In this paper, we identify common characteristics of ETL workflows in an effort of proposing a unified evaluation method for ETL. We also identify the main points of interest in designing, implementing, and maintaining ETL workflows. Finally, we propose a principled organization of test suites based on the TPC-H schema for the problem of experimenting with ETL workflows.

  9. Enabling Structured Exploration of Workflow Performance Variability in Extreme-Scale Environments

    SciTech Connect

    Kleese van Dam, Kerstin; Stephan, Eric G.; Raju, Bibi; Altintas, Ilkay; Elsethagen, Todd O.; Krishnamoorthy, Sriram

    2015-11-15

    Workflows are taking an Workflows are taking an increasingly important role in orchestrating complex scientific processes in extreme scale and highly heterogeneous environments. However, to date we cannot reliably predict, understand, and optimize workflow performance. Sources of performance variability and in particular the interdependencies of workflow design, execution environment and system architecture are not well understood. While there is a rich portfolio of tools for performance analysis, modeling and prediction for single applications in homogenous computing environments, these are not applicable to workflows, due to the number and heterogeneity of the involved workflow and system components and their strong interdependencies. In this paper, we investigate workflow performance goals and identify factors that could have a relevant impact. Based on our analysis, we propose a new workflow performance provenance ontology, the Open Provenance Model-based WorkFlow Performance Provenance, or OPM-WFPP, that will enable the empirical study of workflow performance characteristics and variability including complex source attribution.

  10. Climate Data Analytics Workflow Management

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, S.; Pan, L.; Mattmann, C. A.; Lee, T. J.

    2016-12-01

    In this project we aim to pave a novel path to create a sustainable building block toward Earth science big data analytics and knowledge sharing. Closely studying how Earth scientists conduct data analytics research in their daily work, we have developed a provenance model to record their activities, and to develop a technology to automatically generate workflows for scientists from the provenance. On top of it, we have built the prototype of a data-centric provenance repository, and establish a PDSW (People, Data, Service, Workflow) knowledge network to support workflow recommendation. To ensure the scalability and performance of the expected recommendation system, we have leveraged the Apache OODT system technology. The community-approved, metrics-based performance evaluation web-service will allow a user to select a metric from the list of several community-approved metrics and to evaluate model performance using the metric as well as the reference dataset. This service will facilitate the use of reference datasets that are generated in support of the model-data intercomparison projects such as Obs4MIPs and Ana4MIPs. The data-centric repository infrastructure will allow us to catch richer provenance to further facilitate knowledge sharing and scientific collaboration in the Earth science community. This project is part of Apache incubator CMDA project.

  11. Towards Composing Data Aware Systems Biology Workflows on Cloud Platforms: A MeDICi-based Approach

    SciTech Connect

    Gorton, Ian; Liu, Yan; Yin, Jian; Kulkarni, Anand V.; Wynne, Adam S.

    2011-09-08

    Cloud computing is being increasingly adopted for deploying systems biology scientific workflows. Scientists developing these workflows use a wide variety of fragmented and competing data sets and computational tools of all scales to support their research. To this end, the synergy of client side workflow tools with cloud platforms is a promising approach to share and reuse data and workflows. In such systems, the location of data and computation is essential consideration in terms of quality of service for composing a scientific workflow across remote cloud platforms. In this paper, we describe a cloud-based workflow for genome annotation processing that is underpinned by MeDICi - a middleware designed for data intensive scientific applications. The workflow implementation incorporates an execution layer for exploiting data locality that routes the workflow requests to the processing steps that are colocated with the data. We demonstrate our approach by composing two workflowswith the MeDICi pipelines.

  12. Workflows in a secure environment

    SciTech Connect

    Klasky, Scott A; Podhorszki, Norbert

    2008-01-01

    Petascale simulations on the largest supercomputers in the US require advanced data management techniques in order to optimize the application scien- tist time, and to optimize the time spent on the supercomputers. Researchers in such problems are starting to require workflow automation during their simula- tions in order to monitor the simulations, and in order to automate many of the complex analysis which must take place from the data that is generated from these simulations. Scientific workflows are being used to monitor simulations running on these supercomputers by applying a series of complex analysis, and finally producing images and movies from the variables produced in the simulation, or from the derived quantities produced by the analysis. The typical scenario is where the large calculation runs on the supercomputer, and the auxiliary diagnos- tics/monitors are run on resources, which are either on the local area network of the supercomputer, or over the wide area network. The supercomputers at one of the largest centers are highly secure, and the only method to log into the center is interactive authentication by using One Time Passwords (OTP) that are generated by a security device and expire in half a minute. Therefore, grid certificates are not a current option on these machines in the Department of Energy at Oak Ridge Na- tional Laboratory. In this paper we describe how we have extended the Kepler sci- entific workflow management system to be able to run operations on these supercomputers, how workflows themselves can be executed as batch jobs, and fi- nally, how external data-transfer operations can be utilized when they need to per- form authentication for their own as well.

  13. Resilient workflows for computational mechanics platforms

    NASA Astrophysics Data System (ADS)

    Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

    2010-06-01

    Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].

  14. An Interoperable GridWorkflow Management System

    NASA Astrophysics Data System (ADS)

    Mirto, Maria; Passante, Marco; Epicoco, Italo; Aloisio, Giovanni

    A WorkFlow Management System (WFMS) is a fundamental componentenabling to integrate data, applications and a wide set of project resources. Although a number of scientific WFMSs support this task, many analysis pipelines require large-scale Grid computing infrastructures to cope with their high compute and storage requirements. Such scientific workflows complicate the management of resources, especially in cases where they are offered by several resource providers, managed by different Grid middleware, since resource access must be synchronised in advance to allow reliable workflow execution. Different types of Grid middleware such as gLite, Unicore and Globus are used around the world and may cause interoperability issues if applications involve two or more of them. In this paperwe describe the ProGenGrid Workflow Management System which the main goal is to provide interoperability among these different grid middleware when executing workflows. It allows the composition of batch; parameter sweep and MPI based jobs. The ProGenGrid engine implements the logic to execute such jobs by using a standard language OGF compliant such as JSDL that has been extended for this purpose. Currently, we are testing our system on some bioinformatics case studies in the International Laboratory of Bioinformatics (LIBI) Project (www.libi.it).

  15. Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

    PubMed Central

    Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

  16. Digital work-flow

    PubMed Central

    MARSANGO, V.; BOLLERO, R.; D’OVIDIO, N.; MIRANDA, M.; BOLLERO, P.; BARLATTANI, A.

    2014-01-01

    SUMMARY Objective. The project presents a clinical case in which the digital work-flow procedure was applied for a prosthetic rehabilitation in natural teeth and implants. Materials. Digital work-flow uses patient’s photo for the aesthetic’s planning, digital smile technology for the simulation of the final restoration and real time scanning to register the two arches. Than the scanning are sent to the laboratory that proceed with CAD-CAM production. Results. Digital work-flow offers the opportunities to easily speak with laboratory and patients, gives better clinical results and demonstrated to be a less invasiveness method for the patient. Conclusion. Intra-oral scanner, digital smile design, preview using digital wax-up, CAD-CAM production, are new predictable opportunities for prosthetic team. This work-flow, compared with traditional methods, is faster, more precise and predictable. PMID:25694797

  17. Time Analysis for Probabilistic Workflows

    SciTech Connect

    Czejdo, Bogdan; Ferragut, Erik M

    2012-01-01

    There are many theoretical and practical results in the area of workflow modeling, especially when the more formal workflows are used. In this paper we focus on probabilistic workflows. We show algorithms for time computations in probabilistic workflows. With time of activities more precisely modeled, we can achieve improvement in the work cooperation and analyses of cooperation including simulation and visualization.

  18. CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid

    PubMed Central

    2010-01-01

    Background In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows. Results CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain. Conclusions By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be

  19. CaGrid Workflow Toolkit: a Taverna based workflow tool for cancer grid.

    PubMed

    Tan, Wei; Madduri, Ravi; Nenadic, Alexandra; Soiland-Reyes, Stian; Sulakhe, Dinanath; Foster, Ian; Goble, Carole A

    2010-11-02

    In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows. CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain. By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other

  20. Workflows for Full Waveform Inversions

    NASA Astrophysics Data System (ADS)

    Boehm, Christian; Krischer, Lion; Afanasiev, Michael; van Driel, Martin; May, Dave A.; Rietmann, Max; Fichtner, Andreas

    2017-04-01

    Despite many theoretical advances and the increasing availability of high-performance computing clusters, full seismic waveform inversions still face considerable challenges regarding data and workflow management. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these often manpower-bounded issues that need to be overcome to facilitate further progress. Modern inversions involve huge amounts of data and require a tight integration between numerical PDE solvers, data acquisition and processing systems, nonlinear optimization libraries, and job orchestration frameworks. To this end we created a set of libraries and applications revolving around Salvus (http://salvus.io), a novel software package designed to solve large-scale full waveform inverse problems. This presentation focuses on solving passive source seismic full waveform inversions from local to global scales with Salvus. We discuss (i) design choices for the aforementioned components required for full waveform modeling and inversion, (ii) their implementation in the Salvus framework, and (iii) how it is all tied together by a usable workflow system. We combine state-of-the-art algorithms ranging from high-order finite-element solutions of the wave equation to quasi-Newton optimization algorithms using trust-region methods that can handle inexact derivatives. All is steered by an automated interactive graph-based workflow framework capable of orchestrating all necessary pieces. This naturally facilitates the creation of new Earth models and hopefully sparks new scientific insights. Additionally, and even more importantly, it enhances reproducibility and reliability of the final results.

  1. GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation models

    NASA Astrophysics Data System (ADS)

    Fischer, T.; Naumov, D.; Sattler, S.; Kolditz, O.; Walther, M.

    2015-11-01

    We offer a versatile workflow to convert geological models built with the ParadigmTM GOCAD© (Geological Object Computer Aided Design) software into the open-source VTU (Visualization Toolkit unstructured grid) format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed as a way of communication between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of environmental studies. With two application examples in the Thuringian Syncline, we show how a heterogeneous geological GOCAD model including multiple layers and faults can be used for numerical groundwater flow modeling, in our case employing the OpenGeoSys open-source numerical toolbox for groundwater flow simulations. The presented workflow offers the chance to incorporate increasingly detailed data, utilizing the growing availability of computational power to simulate numerical models.

  2. Workflow management systems in radiology

    NASA Astrophysics Data System (ADS)

    Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim

    1998-07-01

    In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.

  3. Automated data reduction workflows for astronomy. The ESO Reflex environment

    NASA Astrophysics Data System (ADS)

    Freudling, W.; Romaniello, M.; Bramich, D. M.; Ballester, P.; Forchi, V.; García-Dabló, C. E.; Moehler, S.; Neeser, M. J.

    2013-11-01

    Context. Data from complex modern astronomical instruments often consist of a large number of different science and calibration files, and their reduction requires a variety of software tools. The execution chain of the tools represents a complex workflow that needs to be tuned and supervised, often by individual researchers that are not necessarily experts for any specific instrument. Aims: The efficiency of data reduction can be improved by using automatic workflows to organise data and execute a sequence of data reduction steps. To realize such efficiency gains, we designed a system that allows intuitive representation, execution and modification of the data reduction workflow, and has facilities for inspection and interaction with the data. Methods: The European Southern Observatory (ESO) has developed Reflex, an environment to automate data reduction workflows. Reflex is implemented as a package of customized components for the Kepler workflow engine. Kepler provides the graphical user interface to create an executable flowchart-like representation of the data reduction process. Key features of Reflex are a rule-based data organiser, infrastructure to re-use results, thorough book-keeping, data progeny tracking, interactive user interfaces, and a novel concept to exploit information created during data organisation for the workflow execution. Results: Automated workflows can greatly increase the efficiency of astronomical data reduction. In Reflex, workflows can be run non-interactively as a first step. Subsequent optimization can then be carried out while transparently re-using all unchanged intermediate products. We found that such workflows enable the reduction of complex data by non-expert users and minimizes mistakes due to book-keeping errors. Conclusions: Reflex includes novel concepts to increase the efficiency of astronomical data processing. While Reflex is a specific implementation of astronomical scientific workflows within the Kepler workflow

  4. DIaaS: Data-Intensive workflows as a service - Enabling easy composition and deployment of data-intensive workflows on Virtual Research Environments

    NASA Astrophysics Data System (ADS)

    Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.

    2016-12-01

    We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed

  5. A Drupal-Based Collaborative Framework for Science Workflows

    NASA Astrophysics Data System (ADS)

    Pinheiro da Silva, P.; Gandara, A.

    2010-12-01

    Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between

  6. The MPO system for automatic workflow documentation

    DOE PAGES

    Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

    2016-04-18

    Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less

  7. The MPO system for automatic workflow documentation

    SciTech Connect

    Abla, G.; Coviello, E. N.; Flanagan, S. M.; Greenwald, M.; Lee, X.; Romosan, A.; Schissel, D. P.; Shoshani, A.; Stillerman, J.; Wright, J.; Wu, K. J.

    2016-04-18

    Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the software that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.

  8. Building a robust 21st century chemical testing program at the U.S. Environmental Protection Agency: recommendations for strengthening scientific engagement.

    PubMed

    McPartland, Jennifer; Dantzker, Heather C; Portier, Christopher J

    2015-01-01

    Biological pathway-based chemical testing approaches are central to the National Research Council's vision for 21st century toxicity testing. Approaches such as high-throughput in vitro screening offer the potential to evaluate thousands of chemicals faster and cheaper than ever before and to reduce testing on laboratory animals. Collaborative scientific engagement is important in addressing scientific issues arising in new federal chemical testing programs and for achieving stakeholder support of their use. We present two recommendations specifically focused on increasing scientific engagement in the U.S. Environmental Protection Agency (EPA) ToxCast™ initiative. Through these recommendations we seek to bolster the scientific foundation of federal chemical testing efforts such as ToxCast™ and the public health decisions that rely upon them. Environmental Defense Fund works across disciplines and with diverse groups to improve the science underlying environmental health decisions. We propose that the U.S. EPA can strengthen the scientific foundation of its new chemical testing efforts and increase support for them in the scientific research community by a) expanding and diversifying scientific input into the development and application of new chemical testing methods through collaborative workshops, and b) seeking out mutually beneficial research partnerships. Our recommendations provide concrete actions for the U.S. EPA to increase and diversify engagement with the scientific research community in its ToxCast™ initiative. We believe that such engagement will help ensure that new chemical testing data are scientifically robust and that the U.S. EPA gains the support and acceptance needed to sustain new testing efforts to protect public health.

  9. Kronos: a workflow assembler for genome analytics and informatics.

    PubMed

    Taghiyar, M Jafar; Rosner, Jamie; Grewal, Diljot; Grande, Bruno M; Aniba, Radhouane; Grewal, Jasleen; Boutros, Paul C; Morin, Ryan D; Bashashati, Ali; Shah, Sohrab P

    2017-07-01

    The field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise. We present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows. The Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos.

  10. Insightful Workflow For Grid Computing

    SciTech Connect

    Dr. Charles Earl

    2008-10-09

    We developed a workflow adaptation and scheduling system for Grid workflow. The system currently interfaces with and uses the Karajan workflow system. We developed machine learning agents that provide the planner/scheduler with information needed to make decisions about when and how to replan. The Kubrick restructures workflow at runtime, making it unique among workflow scheduling systems. The existing Kubrick system provides a platform on which to integrate additional quality of service constraints and in which to explore the use of an ensemble of scheduling and planning algorithms. This will be the principle thrust of our Phase II work.

  11. KNIME-CDK: Workflow-driven cheminformatics.

    PubMed

    Beisken, Stephan; Meinl, Thorsten; Wiswedel, Bernd; de Figueiredo, Luis F; Berthold, Michael; Steinbeck, Christoph

    2013-08-22

    Cheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process. KNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest. KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment.

  12. KNIME-CDK: Workflow-driven cheminformatics

    PubMed Central

    2013-01-01

    Background Cheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process. Results KNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest. Conclusions KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment. PMID:24103053

  13. Make Your Workflows Smarter

    NASA Technical Reports Server (NTRS)

    Jones, Corey; Kapatos, Dennis; Skradski, Cory

    2012-01-01

    Do you have workflows with many manual tasks that slow down your business? Or, do you scale back workflows because there are simply too many manual tasks? Basic workflow robots can automate some common tasks, but not everything. This presentation will show how advanced robots called "expression robots" can be set up to perform everything from simple tasks such as: moving, creating folders, renaming, changing or creating an attribute, and revising, to more complex tasks like: creating a pdf, or even launching a session of Creo Parametric and performing a specific modeling task. Expression robots are able to utilize the Java API and Info*Engine to do almost anything you can imagine! Best of all, these tools are supported by PTC and will work with later releases of Windchill. Limited knowledge of Java, Info*Engine, and XML are required. The attendee will learn what task expression robots are capable of performing. The attendee will learn what is involved in setting up an expression robot. The attendee will gain a basic understanding of simple Info*Engine tasks

  14. Provenance-Powered Automatic Workflow Generation and Composition

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

    2015-12-01

    In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.

  15. Analysis of Enterprise Workflow Solutions

    NASA Astrophysics Data System (ADS)

    Chen, Cui-E.; Wang, Shulin; Chen, Ying; Meng, Yang; Ma, Hua

    Since the 90’s, workflow technology has been widely applied in various industries, such as office automation(OA), manufacturing, telecommunications services, banking, securities, insurance and other financial services, research institutes and education services, and so on, to improve business process automation and integration capabilities. In this paper, based on Workflow theory, the author proposed a set of policy-based workflow approach in order to support dynamic workflow patterns. Through the expansion of the functions of Shark, it implemented a Workflow engine component-OAShark which can support retrieval / rollback function. The related classes were programmed. The technology was applied to the OA system of an enterprise project. The realization of the enterprise workflow solutions greatly improved the efficiency of the office automation.

  16. Reflex: Graphical workflow engine for data reduction

    NASA Astrophysics Data System (ADS)

    ESO Reflex development Team

    2014-01-01

    Reflex provides an easy and flexible way to reduce VLT/VLTI science data using the ESO pipelines. It allows graphically specifying the sequence in which the data reduction steps are executed, including conditional stops, loops and conditional branches. It eases inspection of the intermediate and final data products and allows repetition of selected processing steps to optimize the data reduction. The data organization necessary to reduce the data is built into the system and is fully automatic; advanced users can plug their own modules and steps into the data reduction sequence. Reflex supports the development of data reduction workflows based on the ESO Common Pipeline Library. Reflex is based on the concept of a scientific workflow, whereby the data reduction cascade is rendered graphically and data seamlessly flow from one processing step to the next. It is distributed with a number of complete test datasets so users can immediately start experimenting and familiarize themselves with the system.

  17. Domain-Specific Languages for Composing Signature Discovery Workflows

    SciTech Connect

    Jacob, Ferosh; Gray, Jeff; Wynne, Adam S.; Liu, Yan; Baker, Nathan A.

    2012-10-23

    Domain-agnostic signature discovery entails investigation across multiple scientific disciplines. The breadth and cross-disciplinary nature of this work requires that existing executables be integrated with new capabilities into workflows, representing a wide range of user tasks. An algorithm may be written in multiple programming languages for various hardware platforms, and so workflow composition requires integrating executables from any number of remote hosts. This raises an engineering issue on how to generate web service wrappers for these heterogeneous executables and to compose them into a scientific workflow environment (e.g., Taverna). In this paper, we introduce two simple Domain-Specific Languages (DSLs) to automate these processes. Our Service Description Language (SDL) describes key elements of a signature discovery service and automatically generates its implementation code. The Workflow Description Language (WDL) describes the pipeline of services and generates deployable artifacts for the Taverna workflow management system. We demonstrate our approach with a real-world workflow composed of services wrapping remote executables.

  18. The evolution of peer review as a basis for scientific publication: directional selection towards a robust discipline?

    PubMed

    Ferreira, Catarina; Bastille-Rousseau, Guillaume; Bennett, Amanda M; Ellington, E Hance; Terwissen, Christine; Austin, Cayla; Borlestean, Adrian; Boudreau, Melanie R; Chan, Kevin; Forsythe, Adrian; Hossie, Thomas J; Landolt, Kristen; Longhi, Jessica; Otis, Josée-Anne; Peers, Michael J L; Rae, Jason; Seguin, Jacob; Watt, Cristen; Wehtje, Morgan; Murray, Dennis L

    2016-08-01

    Peer review is pivotal to science and academia, as it represents a widely accepted strategy for ensuring quality control in scientific research. Yet, the peer-review system is poorly adapted to recent changes in the discipline and current societal needs. We provide historical context for the cultural lag that governs peer review that has eventually led to the system's current structural weaknesses (voluntary review, unstandardized review criteria, decentralized process). We argue that some current attempts to upgrade or otherwise modify the peer-review system are merely sticking-plaster solutions to these fundamental flaws, and therefore are unlikely to resolve them in the long term. We claim that for peer review to be relevant, effective, and contemporary with today's publishing demands across scientific disciplines, its main components need to be redesigned. We propose directional changes that are likely to improve the quality, rigour, and timeliness of peer review, and thereby ensure that this critical process serves the community it was created for. © 2015 Cambridge Philosophical Society.

  19. Using Kepler for Tool Integration in Microarray Analysis Workflows.

    PubMed

    Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

  20. Deploying and sharing U-Compare workflows as web services

    PubMed Central

    2013-01-01

    Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017

  1. Deploying and sharing U-Compare workflows as web services.

    PubMed

    Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia

    2013-02-18

    U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.

  2. Ferret Workflow Anomaly Detection System

    DTIC Science & Technology

    2005-02-28

    The Ferret workflow anomaly detection system project 2003-2004 has provided validation and anomaly detection in accredited workflows in secure...completed to accomplish a goal. Anomaly detection is the determination that a condition departs from the expected. The baseline behavior from which the

  3. A Kepler Workflow Tool for Reproducible AMBER GPU Molecular Dynamics.

    PubMed

    Purawat, Shweta; Ieong, Pek U; Malmstrom, Robert D; Chan, Garrett J; Yeung, Alan K; Walker, Ross C; Altintas, Ilkay; Amaro, Rommie E

    2017-06-20

    With the drive toward high throughput molecular dynamics (MD) simulations involving ever-greater numbers of simulation replicates run for longer, biologically relevant timescales (microseconds), the need for improved computational methods that facilitate fully automated MD workflows gains more importance. Here we report the development of an automated workflow tool to perform AMBER GPU MD simulations. Our workflow tool capitalizes on the capabilities of the Kepler platform to deliver a flexible, intuitive, and user-friendly environment and the AMBER GPU code for a robust and high-performance simulation engine. Additionally, the workflow tool reduces user input time by automating repetitive processes and facilitates access to GPU clusters, whose high-performance processing power makes simulations of large numerical scale possible. The presented workflow tool facilitates the management and deployment of large sets of MD simulations on heterogeneous computing resources. The workflow tool also performs systematic analysis on the simulation outputs and enhances simulation reproducibility, execution scalability, and MD method development including benchmarking and validation. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  4. Workflows for microarray data processing in the Kepler environment

    PubMed Central

    2012-01-01

    Background Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or

  5. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R

  6. Phase Segmentation Methods for an Automatic Surgical Workflow Analysis

    PubMed Central

    Sakurai, Ryuhei; Yamazoe, Hirotake

    2017-01-01

    In this paper, we present robust methods for automatically segmenting phases in a specified surgical workflow by using latent Dirichlet allocation (LDA) and hidden Markov model (HMM) approaches. More specifically, our goal is to output an appropriate phase label for each given time point of a surgical workflow in an operating room. The fundamental idea behind our work lies in constructing an HMM based on observed values obtained via an LDA topic model covering optical flow motion features of general working contexts, including medical staff, equipment, and materials. We have an awareness of such working contexts by using multiple synchronized cameras to capture the surgical workflow. Further, we validate the robustness of our methods by conducting experiments involving up to 12 phases of surgical workflows with the average length of each surgical workflow being 12.8 minutes. The maximum average accuracy achieved after applying leave-one-out cross-validation was 84.4%, which we found to be a very promising result. PMID:28408921

  7. Web-video-mining-supported workflow modeling for laparoscopic surgeries.

    PubMed

    Liu, Rui; Zhang, Xiaoli; Zhang, Hao

    2016-11-01

    As quality assurance is of strong concern in advanced surgeries, intelligent surgical systems are expected to have knowledge such as the knowledge of the surgical workflow model (SWM) to support their intuitive cooperation with surgeons. For generating a robust and reliable SWM, a large amount of training data is required. However, training data collected by physically recording surgery operations is often limited and data collection is time-consuming and labor-intensive, severely influencing knowledge scalability of the surgical systems. The objective of this research is to solve the knowledge scalability problem in surgical workflow modeling with a low cost and labor efficient way. A novel web-video-mining-supported surgical workflow modeling (webSWM) method is developed. A novel video quality analysis method based on topic analysis and sentiment analysis techniques is developed to select high-quality videos from abundant and noisy web videos. A statistical learning method is then used to build the workflow model based on the selected videos. To test the effectiveness of the webSWM method, 250 web videos were mined to generate a surgical workflow for the robotic cholecystectomy surgery. The generated workflow was evaluated by 4 web-retrieved videos and 4 operation-room-recorded videos, respectively. The evaluation results (video selection consistency n-index ≥0.60; surgical workflow matching degree ≥0.84) proved the effectiveness of the webSWM method in generating robust and reliable SWM knowledge by mining web videos. With the webSWM method, abundant web videos were selected and a reliable SWM was modeled in a short time with low labor cost. Satisfied performances in mining web videos and learning surgery-related knowledge show that the webSWM method is promising in scaling knowledge for intelligent surgical systems. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Designing a Roadmap for Workflow Cyberinfrastructure in the Geosciences: From Big Data to the Long Tail

    NASA Astrophysics Data System (ADS)

    Gil, Y.; Deelman, E.; Demir, I.; Duffy, C.; Marru, S.; Pierce, M. E.; Wiener, G.

    2012-12-01

    Scientific activities can be seen as collections of interdependent steps represented as workflows. Gathering and analyzing data, coordinating computational experiments, and publishing results and data products are organized activities traditionally captured in research notebooks. Today we have the ability to digitally codify much of these activities, particularly for computational experiments, using workflow technologies. Workflows may be used to execute enormous computations, to combine distributed data and computing resources in novel ways, and to guide scientists through complex processes. When combined with metadata and provenance-capturing capabilities, workflows allow reproducibility of results, increased efficiency, and enhanced publications. The challenge before us is to make these tools ubiquitously available, enhanced, and adopted for the geosciences. The EarthCube Workflows Community Group was created as part of the NSF EarthCube initiative. Its goal is to constitute a broad community within the geosciences that will identify both short-term problems and long-term challenges for scientific workflows. Aspects of this goal include better education and outreach, better understanding of the different types of workflows, better collaboration between workflow software developers and geoscientists, the identification of gaps, and a vision for geoscience grand challenges that no workflow technology can currently address. The EarthCube Workflows Community Group has established an open process of collecting community input to create a roadmap for workflows in geosciences. The group seeks contributions and feedback on the current draft roadmap from scientists and end users, particularly those that have had minimal exposure to cyberinfrastructure capabilities. The roadmap is accessible in the Workflows are a of the EarthCube site (http://earthcube.ning.com/group/workflow). This roadmap is considered a living document that will be extended based on community feedback

  9. Kronos: a workflow assembler for genome analytics and informatics

    PubMed Central

    Taghiyar, M. Jafar; Rosner, Jamie; Grewal, Diljot; Grande, Bruno M.; Aniba, Radhouane; Grewal, Jasleen; Boutros, Paul C.; Morin, Ryan D.

    2017-01-01

    Abstract Background: The field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into “best practices” for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise. Results: We present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows. Conclusions: The Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos. PMID:28655203

  10. LQCD workflow execution framework: Models, provenance and fault-tolerance

    NASA Astrophysics Data System (ADS)

    Piccoli, Luciano; Dubey, Abhishek; Simone, James N.; Kowalkowlski, James B.

    2010-04-01

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  11. Essential Grid Workflow Monitoring Elements

    SciTech Connect

    Gunter, Daniel K.; Jackson, Keith R.; Konerding, David E.; Lee,Jason R.; Tierney, Brian L.

    2005-07-01

    Troubleshooting Grid workflows is difficult. A typicalworkflow involves a large number of components networks, middleware,hosts, etc. that can fail. Even when monitoring data from all thesecomponents is accessible, it is hard to tell whether failures andanomalies in these components are related toa given workflow. For theGrid to be truly usable, much of this uncertainty must be elim- inated.We propose two new Grid monitoring elements, Grid workflow identifiersand consistent component lifecycle events, that will make Gridtroubleshooting easier, and thus make Grids more usable, by simplifyingthe correlation of Grid monitoring data with a particular Gridworkflow.

  12. On Nondeterministic Workflow Executions

    NASA Astrophysics Data System (ADS)

    Potapova, Alexandra; Su, Jianwen

    The ability to compose existing services to form new functionality is one of the most promising ideas enabled by SOA and the framework of (web) services. A composition or a workflow often involves services distributed over a network and possibly many organizations and administrative domains. Nondeterminism could occur in a composition in at least two ways. The first form is the result of modeling abstraction that hides the detail information and thus makes the "computation" appear non-deterministic. The second form is closely related to "operational optimization", e.g., one may try to invoke more than multiple services for a task, whichever completes first will produce the result and preempts all other services. In this paper, we focus on the latter and measure the complexity of service execution as the amount of needed resources and controlling mechanism for executing nondeterministic service compositions. We formalize the model and complexity problem and develop technical results for this problem in the general setting as well as special cases.

  13. Domain-Specific Languages For Developing and Deploying Signature Discovery Workflows

    SciTech Connect

    Jacob, Ferosh; Wynne, Adam S.; Liu, Yan; Gray, Jeff

    2013-12-02

    Domain-agnostic Signature Discovery entails scientific investigation across multiple domains through the re-use of existing algorithms into workflows. The existing algorithms may be written in any programming language for various hardware architectures (e.g., desktops, commodity clusters, and specialized parallel hardware platforms). This raises an engineering issue in generating Web services for heterogeneous algorithms so that they can be composed into a scientific workflow environment (e.g., Taverna). In this paper, we present our software tool that defines two simple Domain-Specific Languages (DSLs) to automate these processes: SDL and WDL. Our Service Description Language (SDL) describes key elements of a signature discovery algorithm and generates the service code. The Workflow Description Language (WDL) describes the pipeline of services and generates deployable artifacts for the Taverna workflow management system. We demonstrate our tool with a landscape classification example that is represented by BLAST workflows composed of services that wrap original scripts.

  14. Standards for business analytics and departmental workflow.

    PubMed

    Erickson, Bradley J; Meenan, Christopher; Langer, Steve

    2013-02-01

    Efficient workflow is essential for a successful business. However, there is relatively little literature on analytical tools and standards for defining workflow and measuring workflow efficiency. Here, we describe an effort to define a workflow lexicon for medical imaging departments, including the rationale, the process, and the resulting lexicon.

  15. Managing and Communicating Operational Workflow

    PubMed Central

    Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

    2016-01-01

    Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407

  16. Time-Bound Analytic Tasks on Large Data Sets Through Dynamic Configuration of Workflows

    DTIC Science & Technology

    2013-11-01

    Distributed and Data Intensive Scientific Applications .” Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721-730...Execution of Data - Intensive Scientific Workflows.” Cluster Computing Journal, 13(3), 2010. [18] Langford, J. Vowpal Wabbit. https://github.com/JohnLangford...transformation, and ultimately to data distribution [Woollard et al 2008]. Experts can create workflows that represent complex multi-step analytic

  17. EPiK-a Workflow for Electron Tomography in Kepler*

    PubMed Central

    Wang, Jianwu; Crawl, Daniel; Phan, Sébastien; Lawrence, Albert; Ellisman, Mark

    2015-01-01

    Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility. PMID:25621086

  18. EPiK-a Workflow for Electron Tomography in Kepler.

    PubMed

    Chen, Ruijuan; Wan, Xiaohua; Altintas, Ilkay; Wang, Jianwu; Crawl, Daniel; Phan, Sébastien; Lawrence, Albert; Ellisman, Mark

    Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility.

  19. Drug discovery FAQs: workflows for answering multidomain drug discovery questions.

    PubMed

    Chichester, Christine; Digles, Daniela; Siebes, Ronald; Loizou, Antonis; Groth, Paul; Harland, Lee

    2015-04-01

    Modern data-driven drug discovery requires integrated resources to support decision-making and enable new discoveries. The Open PHACTS Discovery Platform (http://dev.openphacts.org) was built to address this requirement by focusing on drug discovery questions that are of high priority to the pharmaceutical industry. Although complex, most of these frequently asked questions (FAQs) revolve around the combination of data concerning compounds, targets, pathways and diseases. Computational drug discovery using workflow tools and the integrated resources of Open PHACTS can deliver answers to most of these questions. Here, we report on a selection of workflows used for solving these use cases and discuss some of the research challenges. The workflows are accessible online from myExperiment (http://www.myexperiment.org) and are available for reuse by the scientific community.

  20. Astronomical Data Reduction Workflows with Reflex

    NASA Astrophysics Data System (ADS)

    Ballester, P.; Bramich, D.; Forchi, V.; Freudling, W.; Garcia-Dabó, C. E.; klein Gebbinck, M.; Modigliani, A.; Moehler, S.; Romaniello, M.

    2014-05-01

    Reflex (http://www.eso.org/reflex) is an environment that provides an easy and flexible way to reduce VLT/VLTI science data using the ESO. Its top-level functionalities are: (1) Reflex allows to graphically specify the sequence in which the data reduction steps are executed, including conditional stops, loops and conditional branches, (2) Reflex makes it easy to inspect the intermediate and final data products and to repeat selected processing steps to optimize the data reduction, (3) the data organization necessary to reduce the data is built into the system and is fully automatic, (4) advanced users can plug-in their own Python or IDL modules and steps into the data reduction sequence, and (5) Reflex supports the development of data reduction workflows based on the ESO Common Pipeline Library. Reflex is based on the concept of a scientific workflow, whereby the data reduction cascade is rendered graphically and data seamlessly flow from one processing step to the next. It is distributed with a number of complete test datasets so that users can immediately start experimenting and familiarize themselves with the system (http://www.eso.org/pipelines). In this demo, we present the latest version of Reflex and its applications for astronomical data reduction processes.

  1. It's All About the Data: Workflow Systems and Weather

    NASA Astrophysics Data System (ADS)

    Plale, B.

    2009-05-01

    Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible

  2. Scalable Analysis of Distributed Workflow Traces

    SciTech Connect

    Gunter, Daniel K.; Tierney, Brian L.; Bailey, Stephen J.

    2005-06-01

    Bacterial response to nitric oxide (NO) is of major importance since NO is an obligatory intermediate of the nitrogen cycle. Transcriptional regulation of the dissimilatory nitric oxides metabolism in bacteria is Large-scale workflows are becoming increasingly important in both the scientific research and business domains. Science and commerce have both experienced an explosion in the sheer amount of data that must be analyzed. An important tool for analyzing these huge datasets is a compute cluster of hundreds or thousands of machines. However, debugging and tuning clusters requires specialized tools. Current cluster performance tools are more oriented towards tightly coupled parallel applications. We describe how the NetLogger Toolkit methodology is more appropriate for this class of cluster computing, and describe our new automatic work flow anomaly detection component. We also describe how this methodology is being used in the Nearby Supernova Factory (SN factory) project at Lawrence Berkeley National Laboratory.

  3. Introducing students to digital geological mapping: A workflow based on cheap hardware and free software

    NASA Astrophysics Data System (ADS)

    Vrabec, Marko; Dolžan, Erazem

    2016-04-01

    The undergraduate field course in Geological Mapping at the University of Ljubljana involves 20-40 students per year, which precludes the use of specialized rugged digital field equipment as the costs would be way beyond the capabilities of the Department. A different mapping area is selected each year with the aim to provide typical conditions that a professional geologist might encounter when doing fieldwork in Slovenia, which includes rugged relief, dense tree cover, and moderately-well- to poorly-exposed bedrock due to vegetation and urbanization. It is therefore mandatory that the digital tools and workflows are combined with classical methods of fieldwork, since, for example, full-time precise GNSS positioning is not viable under such circumstances. Additionally, due to the prevailing combination of complex geological structure with generally poor exposure, students cannot be expected to produce line (vector) maps of geological contacts on the go, so there is no need for such functionality in hardware and software that we use in the field. Our workflow therefore still relies on paper base maps, but is strongly complemented with digital tools to provide robust positioning, track recording, and acquisition of various point-based data. Primary field hardware are students' Android-based smartphones and optionally tablets. For our purposes, the built-in GNSS chips provide adequate positioning precision most of the time, particularly if they are GLONASS-capable. We use Oruxmaps, a powerful free offline map viewer for the Android platform, which facilitates the use of custom-made geopositioned maps. For digital base maps, which we prepare in free Windows QGIS software, we use scanned topographic maps provided by the National Geodetic Authority, but also other maps such as aerial imagery, processed Digital Elevation Models, scans of existing geological maps, etc. Point data, like important outcrop locations or structural measurements, are entered into Oruxmaps as

  4. Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms.

    PubMed

    Lord, Etienne; Diallo, Abdoulaye Baniré; Makarenkov, Vladimir

    2015-03-03

    Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow

  5. The Equivalency between Logic Petri Workflow Nets and Workflow Nets

    PubMed Central

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845

  6. The equivalency between logic Petri workflow nets and workflow nets.

    PubMed

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented.

  7. Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

    SciTech Connect

    Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; Carretero, Jesus; Wozniak, Justin M.; Ross, Rob

    2016-10-06

    The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systems demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.

  8. Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

    DOE PAGES

    Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...

    2016-10-06

    The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less

  9. Workflow-based approaches to neuroimaging analysis.

    PubMed

    Fissell, Kate

    2007-01-01

    Analysis of functional and structural magnetic resonance imaging (MRI) brain images requires a complex sequence of data processing steps to proceed from raw image data to the final statistical tests. Neuroimaging researchers have begun to apply workflow-based computing techniques to automate data analysis tasks. This chapter discusses eight major components of workflow management systems (WFMSs): the workflow description language, editor, task modules, data access, verification, client, engine, and provenance, and their implementation in the Fiswidgets neuroimaging workflow system. Neuroinformatics challenges involved in applying workflow techniques in the domain of neuroimaging are discussed.

  10. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-05-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be

  11. Workflow viewpoints: Analysis of nursing workflow documentation in the electronic health record.

    PubMed

    Whittenburg, Luann

    2010-01-01

    This article amplifies the emphasis on organizational workflow reignited by the Institute of Medicine reports on healthcare quality. The analysis of nursing workflow is central to understanding the power of technology to modify the fundamental constructs of nursing practice. The aim is to understand the evolution of nursing workflow and the concept of workflow from the management and computer science perspectives used in electronic health records and computerized provider order entry. The understanding of the workflow models within health information disciplines may improve the model of nursing workflow underlying the implementation of electronic health record systems. The article follows the Walker and Avant evolutionary method of concept analysis.

  12. AstroTaverna-Building workflows with Virtual Observatory services

    NASA Astrophysics Data System (ADS)

    Ruiz, J. E.; Garrido, J.; Santander-Vela, J. D.; Sánchez-Expósito, S.; Verdes-Montenegro, L.

    2014-11-01

    Despite the long tradition of publishing digital datasets in Astronomy, and the existence of a rich network of services providing astronomical datasets in standardized interoperable formats through the Virtual Observatory (VO), there has been little use of scientific workflow technologies in this field. In this paper we present AstroTaverna, a plugin that we have developed for the Taverna Workbench scientific workflow management system. It integrates existing VO web services as first-class building blocks in Taverna workflows, allowing the digital capture of otherwise lost procedural steps manually performed in e.g. GUI tools, providing reproducibility and re-use. It improves the readability of digital VO recipes with a comprehensive view of the entire automated execution process, complementing the scarce narratives produced in the classic documentation practices, transforming them into living tutorials for an efficient use of the VO infrastructure. The plugin also adds astronomical data manipulation and transformation tools based on the STIL Tool Set and the integration of Aladin VO software, as well as interactive connectivity with SAMP-compliant astronomy tools.

  13. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

    DOE PAGES

    Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...

    2015-07-14

    Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less

  14. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

    SciTech Connect

    Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; Tierney, Brian; Vetter, Jeffrey S.; Baldin, Ilya; Castillo, Claris; Juve, Gideon; Krol, Dariusz; Lynch, Vickie; Mayer, Ben; Meredith, Jeremy; Proffen, Thomas; Ruth, Paul; Ferreira da Silva, Rafael

    2015-07-14

    Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

  15. Designing a road map for geoscience workflows

    NASA Astrophysics Data System (ADS)

    Duffy, Christopher; Gil, Yolanda; Deelman, Ewa; Marru, Suresh; Pierce, Marlon; Demir, Ibrahim; Wiener, Gerry

    2012-06-01

    Advances in geoscience research and discovery are fundamentally tied to data and computation, but formal strategies for managing the diversity of models and data resources in the Earth sciences have not yet been resolved or fully appreciated. The U.S. National Science Foundation (NSF) EarthCube initiative (http://earthcube.ning.com), which aims to support community-guided cyberinfrastructure to integrate data and information across the geosciences, recently funded four community development activities: Geoscience Workflows; Semantics and Ontologies; Data Discovery, Mining, and Integration; and Governance. The Geoscience Workflows working group, with broad participation from the geosciences, cyberinfrastructure, and other relevant communities, is formulating a workflows road map (http://sites.google.com/site/earthcubeworkflow/). The Geoscience Workflows team coordinates with each of the other community development groups given their direct relevance to workflows. Semantics and ontologies are mechanisms for describing workflows and the data they process.

  16. Combining ontologies and workflows to design formal protocols for biological laboratories

    PubMed Central

    2010-01-01

    Background Laboratory protocols in life sciences tend to be written in natural language, with negative consequences on repeatability, distribution and automation of scientific experiments. Formalization of knowledge is becoming popular in science. In the case of laboratory protocols two levels of formalization are needed: one for the entities and individuals operations involved in protocols and another one for the procedures, which can be manually or automatically executed. This study aims to combine ontologies and workflows for protocol formalization. Results A laboratory domain specific ontology and the COW (Combining Ontologies with Workflows) software tool were developed to formalize workflows built on ontologies. A method was specifically set up to support the design of structured protocols for biological laboratory experiments. The workflows were enhanced with ontological concepts taken from the developed domain specific ontology. The experimental protocols represented as workflows are saved in two linked files using two standard interchange languages (i.e. XPDL for workflows and OWL for ontologies). A distribution package of COW including installation procedure, ontology and workflow examples, is freely available from http://www.bmr-genomics.it/farm/cow. Conclusions Using COW, a laboratory protocol may be directly defined by wet-lab scientists without writing code, which will keep the resulting protocol's specifications clear and easy to read and maintain. PMID:20416048

  17. Construction of biological networks from unstructured information based on a semi-automated curation workflow.

    PubMed

    Szostak, Justyna; Ansari, Sam; Madan, Sumit; Fluck, Juliane; Talikka, Marja; Iskandar, Anita; De Leon, Hector; Hofmann-Apitius, Martin; Peitsch, Manuel C; Hoeng, Julia

    2015-06-17

    Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to extract causal and correlative relationships from scientific literature and to transcribe them into the computable and human readable Biological Expression Language (BEL). The workflow combines state-of-the-art linguistic tools for recognition of various entities and extraction of knowledge from literature sources. Unlike most other approaches, the workflow outputs the results to a curation interface for manual curation and converts them into BEL documents that can be compiled to form biological networks. We developed a new semi-automated knowledge extraction workflow that was designed to capture and organize scientific knowledge and reduce the required curation skills and effort for this task. The workflow was used to build a network that represents the cellular and molecular mechanisms implicated in atherosclerotic plaque destabilization in an apolipoprotein-E-deficient (ApoE(-/-)) mouse model. The network was generated using knowledge extracted from the primary literature. The resultant atherosclerotic plaque destabilization network contains 304 nodes and 743 edges supported by 33 PubMed referenced articles. A comparison between the semi-automated and conventional curation processes showed similar results, but significantly reduced curation effort for the semi-automated process. Creating structured knowledge from unstructured text is an important step for the mechanistic interpretation and reusability of knowledge. Our new semi-automated knowledge extraction workflow reduced the curation skills and effort required to capture and organize scientific knowledge. The

  18. Big data analytics workflow management for eScience

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

    2015-04-01

    In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the

  19. Data Processing Workflows to Support Reproducible Data-driven Research in Hydrology

    NASA Astrophysics Data System (ADS)

    Goodall, J. L.; Essawy, B.; Xu, H.; Rajasekar, A.; Moore, R. W.

    2015-12-01

    Geoscience analyses often require the use of existing data sets that are large, heterogeneous, and maintained by different organizations. A particular challenge in creating reproducible analyses using these data sets is automating the workflows required to transform raw datasets into model specific input files and finally into publication ready visualizations. Data grids, such as the Integrated Rule-Oriented Data System (iRODS), are architectures that allow scientists to access and share large data sets that are geographically distributed on the Internet, but appear to the scientist as a single file management system. The DataNet Federation Consortium (DFC) project is built on iRODS and aims to demonstrate data and computational interoperability across scientific communities. This paper leverages iRODS and the DFC to demonstrate how hydrological modeling workflows can be encapsulated as workflows using the iRODS concept of Workflow Structured Objects (WSO). An example use case is presented for automating hydrologic model post-processing routines that demonstrates how WSOs can be created and used within the DFC to automate the creation of data visualizations from large model output collections. By co-locating the workflow used to create the visualization with the data collection, the use case demonstrates how data grid technology aids in reuse, reproducibility, and sharing of workflows within scientific communities.

  20. Facilitating hydrological data analysis workflows in R: the RHydro package

    NASA Astrophysics Data System (ADS)

    Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

    2015-04-01

    The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges

  1. Workflow simulation and its system development

    NASA Astrophysics Data System (ADS)

    Li, Renwang; Zhu, Zefei; Wang, Xianmei; Liu, Lei; Jiang, Xuefeng

    2005-12-01

    Workflow technique is a research hotspot in the field of advanced manufacturing technology. However, up to now workflow simulation still lacks necessary evaluation of rationality and validity. Therefore, a principle of workflow simulation was set forth; a kind of workflow simulation mechanism is proposed. It is divided into presentation layer, business logic layer and database layer. Then, taking process of handling business orders as example, and taking time, quality, cost and service as key factors, a feasible method was developed. Its simulation results of 30 days were listed and analyzed. At last, an amended process of handling business orders is brought forward.

  2. Pegasus Workflow Management System: Helping Applications From Earth and Space

    NASA Astrophysics Data System (ADS)

    Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.

    2010-12-01

    Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved

  3. Support for Taverna workflows in the VPH-Share cloud platform.

    PubMed

    Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F

    2017-07-01

    To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. The medical simulation markup language - simplifying the biomechanical modeling workflow.

    PubMed

    Suwelack, Stefan; Stoll, Markus; Schalck, Sebastian; Schoch, Nicolai; Dillmann, Rüdiger; Bendl, Rolf; Heuveline, Vincent; Speidel, Stefanie

    2014-01-01

    Modeling and simulation of the human body by means of continuum mechanics has become an important tool in diagnostics, computer-assisted interventions and training. This modeling approach seeks to construct patient-specific biomechanical models from tomographic data. Usually many different tools such as segmentation and meshing algorithms are involved in this workflow. In this paper we present a generalized and flexible description for biomechanical models. The unique feature of the new modeling language is that it not only describes the final biomechanical simulation, but also the workflow how the biomechanical model is constructed from tomographic data. In this way, the MSML can act as a middleware between all tools used in the modeling pipeline. The MSML thus greatly facilitates the prototyping of medical simulation workflows for clinical and research purposes. In this paper, we not only detail the XML-based modeling scheme, but also present a concrete implementation. Different examples highlight the flexibility, robustness and ease-of-use of the approach.

  5. Workflow Modeling Using Stochastic Activity Networks

    NASA Astrophysics Data System (ADS)

    Javadi Mottaghi, Fatemeh; Abdollahi Azgomi, Mohammad

    The essence of workflow systems is workflow patterns. The aim is to use an existing powerful formal modeling language with workflow systems. Stochastic activity networks (SANs) are a powerful extension of Petri nets. Having the SAN model of a system, one can verify the functional aspects and evaluate the operational measures, both on a same model. SANs have already been used in a wide range of applications. As a new application area, we have used SANs for modeling workflow systems. The results show that the most important workflow patterns can be modeled in SANs. In addition, the resulting SAN models of workflow systems can be used for model checking and/or performance evaluation purposes using the existing tools. In this paper, we will present the results of this work. For this purpose, we will present the SAN submodels corresponding to the most important workflow patterns. Then, the proposed SAN submodels are used in a case study for workflow modeling, which will also be presented in this paper. Finally, we will present the results of the evaluation of the model using the Möbius modeling tool.

  6. Validation of the Applied Biosystems RapidFinder Shiga Toxin-Producing E. coli (STEC) Detection Workflow.

    PubMed

    Cloke, Jonathan; Matheny, Sharon; Swimley, Michelle; Tebbs, Robert; Burrell, Angelia; Flannery, Jonathan; Bastin, Benjamin; Bird, Patrick; Benzinger, M Joseph; Crowley, Erin; Agin, James; Goins, David; Salfinger, Yvonne; Brodsky, Michael; Fernandez, Maria Cristina

    2016-11-01

    The Applied Biosystems™ RapidFinder™ STEC Detection Workflow (Thermo Fisher Scientific) is a complete protocol for the rapid qualitative detection of Escherichia coli (E. coli) O157:H7 and the "Big 6" non-O157 Shiga-like toxin-producing E. coli (STEC) serotypes (defined as serogroups: O26, O45, O103, O111, O121, and O145). The RapidFinder STEC Detection Workflow makes use of either the automated preparation of PCR-ready DNA using the Applied Biosystems PrepSEQ™ Nucleic Acid Extraction Kit in conjunction with the Applied Biosystems MagMAX™ Express 96-well magnetic particle processor or the Applied Biosystems PrepSEQ Rapid Spin kit for manual preparation of PCR-ready DNA. Two separate assays comprise the RapidFinder STEC Detection Workflow, the Applied Biosystems RapidFinder STEC Screening Assay and the Applied Biosystems RapidFinder STEC Confirmation Assay. The RapidFinder STEC Screening Assay includes primers and probes to detect the presence of stx1 (Shiga toxin 1), stx2 (Shiga toxin 2), eae (intimin), and E. coli O157 gene targets. The RapidFinder STEC Confirmation Assay includes primers and probes for the "Big 6" non-O157 STEC and E. coli O157:H7. The use of these two assays in tandem allows a user to detect accurately the presence of the "Big 6" STECs and E. coli O157:H7. The performance of the RapidFinder STEC Detection Workflow was evaluated in a method comparison study, in inclusivity and exclusivity studies, and in a robustness evaluation. The assays were compared to the U.S. Department of Agriculture (USDA), Food Safety and Inspection Service (FSIS) Microbiology Laboratory Guidebook (MLG) 5.09: Detection, Isolation and Identification of Escherichia coli O157:H7 from Meat Products and Carcass and Environmental Sponges for raw ground beef (73% lean) and USDA/FSIS-MLG 5B.05: Detection, Isolation and Identification of Escherichia coli non-O157:H7 from Meat Products and Carcass and Environmental Sponges for raw beef trim. No statistically significant

  7. a Standardized Approach to Topographic Data Processing and Workflow Management

    NASA Astrophysics Data System (ADS)

    Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.

    2013-12-01

    An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and

  8. Multi-perspective workflow modeling for online surgical situation models.

    PubMed

    Franke, Stefan; Meixensberger, Jürgen; Neumuth, Thomas

    2015-04-01

    Surgical workflow management is expected to enable situation-aware adaptation and intelligent systems behavior in an integrated operating room (OR). The overall aim is to unburden the surgeon and OR staff from both manual maintenance and information seeking tasks. A major step toward intelligent systems behavior is a stable classification of the surgical situation from multiple perspectives based on performed low-level tasks. The present work proposes a method for the classification of surgical situations based on multi-perspective workflow modeling. A model network that interconnects different types of surgical process models is described. Various aspects of a surgical situation description were considered: low-level tasks, high-level tasks, patient status, and the use of medical devices. A study with sixty neurosurgical interventions was conducted to evaluate the performance of our approach and its robustness against incomplete workflow recognition input. A correct classification rate of over 90% was measured for high-level tasks and patient status. The device usage models for navigation and neurophysiology classified over 95% of the situations correctly, whereas the ultrasound usage was more difficult to predict. Overall, the classification rate decreased with an increasing level of input distortion. Autonomous adaptation of medical devices and intelligent systems behavior do not currently depend solely on low-level tasks. Instead, they require a more general type of understanding of the surgical condition. The integration of various surgical process models in a network provided a comprehensive representation of the interventions and allowed for the generation of extensive situation descriptions. Multi-perspective surgical workflow modeling and online situation models will be a significant pre-requisite for reliable and intelligent systems behavior. Hence, they will contribute to a cooperative OR environment. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. A Formal Framework for Workflow Analysis

    NASA Astrophysics Data System (ADS)

    Cravo, Glória

    2010-09-01

    In this paper we provide a new formal framework to model and analyse workflows. A workflow is the formal definition of a business process that consists in the execution of tasks in order to achieve a certain objective. In our work we describe a workflow as a graph whose vertices represent tasks and the arcs are associated to workflow transitions. Each task has associated an input/output logic operator. This logic operator can be the logical AND (•), the OR (⊗), or the XOR -exclusive-or—(⊕). Moreover, we introduce algebraic concepts in order to completely describe completely the structure of workflows. We also introduce the concept of logical termination. Finally, we provide a necessary and sufficient condition for this property to hold.

  10. Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

    SciTech Connect

    Ludascher, Bertram; Altintas, Ilkay

    2013-09-06

    Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of a deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.

  11. A Community-Driven Workflow Recommendations and Reuse Infrastructure

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

    2013-12-01

    Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from

  12. A workflow learning model to improve geovisual analytics utility.

    PubMed

    Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

    2009-01-01

    the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.

  13. A workflow learning model to improve geovisual analytics utility

    PubMed Central

    Roth, Robert E; MacEachren, Alan M; McCabe, Craig A

    2011-01-01

    concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545

  14. Implementation Recommendations for MOSAIC: A Workflow Architecture for Analytic Enrichment. Analysis and Recommendations for the Implementation of a Cohesive Method for Orchestrating Analytics in a Distributed Model

    DTIC Science & Technology

    2011-02-01

    18 LONI or Ptolemy /Kepler (Scientific Workflow Projects) as Executive ............................... 19...55 D.3: Ptolemy GUI Example Workflow .................................................................................... 56...architectural technologies considered that can fill this role examined here are UIMA, OpenPipeline, Mule, and Ptolemy , the applicability and

  15. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling.

    PubMed

    Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao

    2016-01-01

    Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads.

  16. SwinDeW-C: A Peer-to-Peer Based Cloud Workflow System

    NASA Astrophysics Data System (ADS)

    Liu, Xiao; Yuan, Dong; Zhang, Gaofeng; Chen, Jinjun; Yang, Yun

    Workflow systems are designed to support the process automation of large scale business and scientific applications. In recent years, many workflow systems have been deployed on high performance computing infrastructures such as cluster, peer-to-peer (p2p), and grid computing (Moore, 2004; Wang, Jie, & Chen, 2009; Yang, Liu, Chen, Lignier, & Jin, 2007). One of the driving forces is the increasing demand of large scale instance and data/computation intensive workflow applications (large scale workflow applications for short) which are common in both eBusiness and eScience application areas. Typical examples (will be detailed in Section 13.2.1) include such as the transaction intensive nation-wide insurance claim application process; the data and computation intensive pulsar searching process in Astrophysics. Generally speaking, instance intensive applications are those processes which need to be executed for a large number of times sequentially within a very short period or concurrently with a large number of instances (Liu, Chen, Yang, & Jin, 2008; Liu et al., 2010; Yang et al., 2008). Therefore, large scale workflow applications normally require the support of high performance computing infrastructures (e.g. advanced CPU units, large memory space and high speed network), especially when workflow activities are of data and computation intensive themselves. In the real world, to accommodate such a request, expensive computing infrastructures including such as supercomputers and data servers are bought, installed, integrated and maintained with huge cost by system users

  17. Taverna Workflows in the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Benson, K.; Cecconi, B.

    2015-12-01

    Taverna workflows used in the Virtual ObservatoryPlanetary and Solar applications developed over the last decade generate dataat a previously unimaginable scale. One of these programmes which builds on the strengths of IDIS of Europlanet FP7, is the Virtual European Solar and Planetary Access (VESPA). With VESPA more data will be distributed and the connectivity of tools and infrastructure willimprove. VESPA enables growth of the user and provider community. However the challenge of connectivity persist throughout applications data services. VESPA calls are formed in part by tools and interactions services. One such tool and interaction service is the Taverna workflow management system. Workflows allow to address the challenges of data interconnectivity by establishing pipeline to services offered by other data streaming services. Workflows offer the capability to cross domains and overome interoperability issues. Furthermore, Taverna offers sharing of workflows; academic community 'myExperiment', a social site for scientists, supports search and opens access to pre existing workflows. This presentation focuses on cross domain workflows including use of the infrastructure setup with Helio, EUROPLANET and VAMDC projects. Hands on demonstration and an opportunity to join the community discussion will make the presentation more interactive

  18. Structuring Clinical Workflows for Diabetes Care

    PubMed Central

    Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.

    2014-01-01

    Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765

  19. VisIVO: A Web-Based, Workflow-Enabled Gateway for Astrophysical Visualization

    NASA Astrophysics Data System (ADS)

    Costa, A.; Bandieramonte, M.; Becciani, U.; Krokos, M.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, S.; Sciacca, E.; Vitello, F.

    2013-10-01

    We present a web-based and workflow-enabled framework called VisIVO Gateway that allows integration of large-scale multidimensional datasets together with applications for visualization and exploration on Distributed Computing Infrastructures (DCIs). Our framework is implemented through a workflow-enabled portal wrapped around WS-PGRADE which is the grid User Support Environment (gUSE) portal. We provide customized interfaces for creating, invoking, monitoring and also modifying scientific workflows. All technical complexities, e.g. related to visualization algorithms and DCI configurations, are conveniently hidden from view. A number of workflows are enabled by default, e.g. implementing local or remote uploading and creation of scientific movies. Scientific movies are useful not only to scientists for presenting their research results, but also to museums and science centers for engaging visitors with complex scientific concepts. Our gateway can be accessed via standard www interfaces but also through a newly developed iOS mobile application offering novel ways for sharing analysis and exploration experiences with large-scale datasets in collaborative environments.

  20. Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials

    PubMed Central

    Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja

    2015-01-01

    Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437

  1. Workflow Optimization in Vertebrobasilar Occlusion

    SciTech Connect

    Kamper, Lars Meyn, Hannes; Nordmeyer, Simone; Kempkes, Udo; Piroth, Werner

    2012-06-15

    Objective: In vertebrobasilar occlusion, rapid recanalization is the only substantial means to improve the prognosis. We introduced a standard operating procedure (SOP) for interventional therapy to analyze the effects on interdisciplinary time management. Methods: Intrahospital time periods between hospital admission and neuroradiological intervention were retrospectively analyzed, together with the patients' outcome, before (n = 18) and after (n = 20) implementation of the SOP. Results: After implementation of the SOP, we observed statistically significant improvement of postinterventional patient neurological status (p = 0.017). In addition, we found a decrease of 5:33 h for the mean time period from hospital admission until neuroradiological intervention. The recanalization rate increased from 72.2% to 80% after implementation of the SOP. Conclusion: Our results underscore the relevance of SOP implementation and analysis of time management for clinical workflow optimization. Both may trigger awareness for the need of efficient interdisciplinary time management. This could be an explanation for the decreased time periods and improved postinterventional patient status after SOP implementation.

  2. The Diabetic Retinopathy Screening Workflow

    PubMed Central

    Bolster, Nigel M.; Giardini, Mario E.; Bastawrous, Andrew

    2015-01-01

    Complications of diabetes mellitus, namely diabetic retinopathy and diabetic maculopathy, are the leading cause of blindness in working aged people. Sufferers can avoid blindness if identified early via retinal imaging. Systematic screening of the diabetic population has been shown to greatly reduce the prevalence and incidence of blindness within the population. Many national screening programs have digital fundus photography as their basis. In the past 5 years several techniques and adapters have been developed that allow digital fundus photography to be performed using smartphones. We review recent progress in smartphone-based fundus imaging and discuss its potential for integration into national systematic diabetic retinopathy screening programs. Some systems have produced promising initial results with respect to their agreement with reference standards. However further multisite trialling of such systems’ use within implementable screening workflows is required if an evidence base strong enough to affect policy change is to be established. If this were to occur national diabetic retinopathy screening would, for the first time, become possible in low- and middle-income settings where cost and availability of trained eye care personnel are currently key barriers to implementation. As diabetes prevalence and incidence is increasing sharply in these settings, the impact on global blindness could be profound. PMID:26596630

  3. Security aspects in teleradiology workflow

    NASA Astrophysics Data System (ADS)

    Soegner, Peter I.; Helweg, Gernot; Holzer, Heimo; zur Nedden, Dieter

    2000-05-01

    The medicolegal necessity of privacy, security and confidentiality was the aim of the attempt to develop a secure teleradiology workflow between the telepartners -- radiologist and the referring physician. To avoid the lack of dataprotection and datasecurity we introduced biometric fingerprint scanners in combination with smart cards to identify the teleradiology partners and communicated over an encrypted TCP/IP satellite link between Innsbruck and Reutte. We used an asymmetric kryptography method to guarantee authentification, integrity of the data-packages and confidentiality of the medical data. It was necessary to use a biometric feature to avoid a case of mistaken identity of persons, who wanted access to the system. Only an invariable electronical identification allowed a legal liability to the final report and only a secure dataconnection allowed the exchange of sensible medical data between different partners of Health Care Networks. In our study we selected the user friendly combination of a smart card and a biometric fingerprint technique, called SkymedTM Double Guard Secure Keyboard (Agfa-Gevaert) to confirm identities and log into the imaging workstations and the electronic patient record. We examined the interoperability of the used software with the existing platforms. Only the WIN-XX operating systems could be protected at the time of our study.

  4. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  5. The complete digital workflow in fixed prosthodontics: a systematic review.

    PubMed

    Joda, Tim; Zarone, Fernando; Ferrari, Marco

    2017-09-19

    generation, allocation concealment, blinding, completeness of outcome data, selective reporting, and other bias using the Cochrane Collaboration tool. A judgment of risk of bias was assigned if one or more key domains had a high or unclear risk of bias. An official registration of the systematic review was not performed. The systematic search identified 67 titles, 32 abstracts thereof were screened, and subsequently, three full-texts included for data extraction. Analysed RCTs were heterogeneous without follow-up. One study demonstrated that fully digitally produced dental crowns revealed the feasibility of the process itself; however, the marginal precision was lower for lithium disilicate (LS2) restorations (113.8 μm) compared to conventional metal-ceramic (92.4 μm) and zirconium dioxide (ZrO2) crowns (68.5 μm) (p < 0.05). Another study showed that leucite-reinforced glass ceramic crowns were esthetically favoured by the patients (8/2 crowns) and clinicians (7/3 crowns) (p < 0.05). The third study investigated implant crowns. The complete digital workflow was more than twofold faster (75.3 min) in comparison to the mixed analog-digital workflow (156.6 min) (p < 0.05). No RCTs could be found investigating multi-unit fixed dental prostheses (FDP). The number of RCTs testing complete digital workflows in fixed prosthodontics is low. Scientifically proven recommendations for clinical routine cannot be given at this time. Research with high-quality trials seems to be slower than the industrial progress of available digital applications. Future research with well-designed RCTs including follow-up observation is compellingly necessary in the field of complete digital processing.

  6. Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

    NASA Astrophysics Data System (ADS)

    Harris, A. T.; Ramachandran, R.; Maskey, M.

    2013-12-01

    The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL

  7. COSMOS: Python library for massively parallel workflows.

    PubMed

    Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J

    2014-10-15

    Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  8. Integrating Automated Workflows, Human Intelligence and Collaboration

    PubMed Central

    Mirel, Barbara; Eichinger, Felix; Nair, Viji; Kretzler, Matthias

    2009-01-01

    Many methods and tools have evolved for microarray analysis such as single probe evaluation, promoter module modeling and pathway analysis. Little is known, however, about optimizing this flow of analysis for the flexible reasoning biomedical researchers need for hypothesizing about disease mechanisms. In developing and implementing a workflow, we found that workflows are not complete or valuable unless automation is well-integrated with human intelligence. We present our workflow for the translational problem of classifying new sub-types of renal diseases. Using our workflow as an example, we explain opportunities and limitations in achieving this necessary integration and propose approaches to guide such integration for the next great frontier-facilitating exploratory analysis of candidate genes. PMID:21347175

  9. Temporal similarity measures for querying clinical workflows.

    PubMed

    Combi, Carlo; Gozzi, Matteo; Oliboni, Barbara; Juarez, Jose M; Marin, Roque

    2009-05-01

    In this paper, we extend a preliminary proposal and discuss in a deeper and more formal way an approach to evaluate temporal similarity between clinical workflow cases (i.e., executions of clinical processes). More precisely, we focus on (i) the representation of clinical processes by using a temporal conceptual workflow model; (ii) the definition of ad hoc temporal constraint networks to formally represent clinical workflow cases; (iii) the definition of temporal similarity for clinical workflow cases based on the comparison of temporal constraint networks; (iv) the management of the similarity of clinical processes related to the Italian guideline for stroke prevention and management (SPREAD). Clinical processes are composed by clinical activities to be done by given actors in a given order satisfying given temporal constraints. This description means that clinical processes can be seen as organizational processes, and modeled by workflow schemata. When a workflow schema represents a clinical process, its cases represent different instances derived from dealing with different patients in different situations. With respect to all the cases related to a workflow schema, each clinical case can be different with respect to its structure and to its temporal aspects. Clinical cases can be stored in clinical databases and information retrieval can be done evaluating the similarity between workflow cases. We first describe a possible approach to the conceptual modeling of a clinical process, by using a temporally extended workflow model. Then, we define how a workflow case can be represented as a set of activities, and show how to express them through temporal constraint networks. Once we have built temporal constraint networks related to the cases to compare, we propose a similarity function able to evaluate the differences between the considered cases with respect to the order and duration of corresponding activities, and with respect to the presence/absence of some

  10. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    PubMed Central

    Tsai, Yingssu; McPhillips, Scott E.; González, Ana; McPhillips, Timothy M.; Zinn, Daniel; Cohen, Aina E.; Feese, Michael D.; Bushnell, David; Tiefenbrunn, Theresa; Stout, C. David; Ludaescher, Bertram; Hedman, Britt; Hodgson, Keith O.; Soltis, S. Michael

    2013-01-01

    AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallo­graphy steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference

  11. Integrated workflows for spiking neuronal network simulations.

    PubMed

    Antolík, Ján; Davison, Andrew P

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.

  12. How Workflow Documentation Facilitates Curation Planning

    NASA Astrophysics Data System (ADS)

    Wickett, K.; Thomer, A. K.; Baker, K. S.; DiLauro, T.; Asangba, A. E.

    2013-12-01

    The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation "upstream" in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as 'intermediate' data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures. We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

  13. Multilevel Workflow System in the ATLAS Experiment

    NASA Astrophysics Data System (ADS)

    Borodin, M.; De, K.; Garcia Navarro, J.; Golubkov, D.; Klimentov, A.; Maeno, T.; Vaniachine, A.; ATLAS Collaboration

    2015-05-01

    The ATLAS experiment is scaling up Big Data processing for the next LHC run using a multilevel workflow system comprised of many layers. In Big Data processing ATLAS deals with datasets, not individual files. Similarly a task (comprised of many jobs) has become a unit of the ATLAS workflow in distributed computing, with about 0.8M tasks processed per year. In order to manage the diversity of LHC physics (exceeding 35K physics samples per year), the individual data processing tasks are organized into workflows. For example, the Monte Carlo workflow is composed of many steps: generate or configure hard-processes, hadronize signal and minimum-bias (pileup) events, simulate energy deposition in the ATLAS detector, digitize electronics response, simulate triggers, reconstruct data, convert the reconstructed data into ROOT ntuples for physics analysis, etc. Outputs are merged and/or filtered as necessary to optimize the chain. The bi-level workflow manager - ProdSys2 - generates actual workflow tasks and their jobs are executed across more than a hundred distributed computing sites by PanDA - the ATLAS job-level workload management system. On the outer level, the Database Engine for Tasks (DEfT) empowers production managers with templated workflow definitions. On the next level, the Job Execution and Definition Interface (JEDI) is integrated with PanDA to provide dynamic job definition tailored to the sites capabilities. We report on scaling up the production system to accommodate a growing number of requirements from main ATLAS areas: Trigger, Physics and Data Preparation.

  14. Editing and publishing of a medical journal. Success of an unconventional workflow.

    PubMed

    Antony, Sajjeev X; Al-Hussaini, Ala'Aldin

    2004-01-01

    Regional journals often face constraints that threaten their growth, calling for novel coping strategies. This paper outlines the problems and challenges in editing and publishing the SQU Journal for Scientific Research: Medical Sciences, the only peer-reviewed medical journal in the Sultanate of Oman. These included the absence of secretarial support and the consequent need to reduce paperwork, the fact that most papers required substantial editing even after peer review, and the lack of a single workflow for creating documents for the press and the Internet. These challenges were successfully met by creating an unconventional all-electronic workflow that catered to both the print and the online versions. The paper describes this workflow and offers suggestions for journals wishing to streamline theirs.

  15. Progress in digital color workflow understanding in the International Color Consortium (ICC) Workflow WG

    NASA Astrophysics Data System (ADS)

    McCarthy, Ann

    2006-01-01

    The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.

  16. A Novel Spectral Library Workflow to Enhance Protein Identifications

    PubMed Central

    Li, Haomin; Zong, Nobel C.; Liang, Xiangbo; Kim, Allen; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-01-01

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. PMID:23391412

  17. A novel spectral library workflow to enhance protein identifications.

    PubMed

    Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-04-09

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications.

  18. Creating OGC Web Processing Service workflows using a web-based editor

    NASA Astrophysics Data System (ADS)

    de Jesus, J.; Walker, P.; Grant, M.

    2012-04-01

    The OGC WPS (Web Processing Service) specifies how geospatial algorithms may be accessed in an SOA (Service Oriented Architecture). Service providers can encode both simple and sophisticated algorithms as WPS processes and publish them as web services. These services are not only useful individually but may be built into complex processing chains (workflows) that can solve complex data analysis and/or scientific problems. The NETMAR project has extended the Web Processing Service (WPS) framework to provide transparent integration between it and the commonly used WSDL (Web Service Description Language) that describes the web services and its default SOAP (Simple Object Access Protocol) binding. The extensions allow WPS services to be orchestrated using commonly used tools (in this case Taverna Workbench, but BPEL based systems would also be an option). We have also developed a WebGUI service editor, based on HTML5 and the WireIt! Javascript API, that allows users to create these workflows using only a web browser. The editor is coded entirely in Javascript and performs all XSLT transformations needed to produce a Taverna compatible (T2FLOW) workflow description which can be exported and run on a local Taverna Workbench or uploaded to a web-based orchestration server and run there. Here we present the NETMAR WebGUI service chain editor and discuss the problems associated with the development of a WebGUI for scientific workflow editing; content transformation into the Taverna orchestration language (T2FLOW/SCUFL); final orchestration in the Taverna engine and how to deal with the large volumes of data being transferred between different WPS services (possibly running on different servers) during workflow orchestration. We will also demonstrate using the WebGUI for creating a simple workflow making use of published web processing services, showing how simple services may be chained together to produce outputs that would previously have required a GIS (Geographic

  19. Impact of digital radiography on clinical workflow.

    PubMed

    May, G A; Deer, D D; Dackiewicz, D

    2000-05-01

    It is commonly accepted that digital radiography (DR) improves workflow and patient throughput compared with traditional film radiography or computed radiography (CR). DR eliminates the film development step and the time to acquire the image from a CR reader. In addition, the wide dynamic range of DR is such that the technologist can perform the quality-control (QC) step directly at the modality in a few seconds, rather than having to transport the newly acquired image to a centralized QC station for review. Furthermore, additional workflow efficiencies can be achieved with DR by employing tight radiology information system (RIS) integration. In the DR imaging environment, this provides for patient demographic information to be automatically downloaded from the RIS to populate the DR Digital Imaging and Communications in Medicine (DICOM) image header. To learn more about this workflow efficiency improvement, we performed a comparative study of workflow steps under three different conditions: traditional film/screen x-ray, DR without RIS integration (ie, manual entry of patient demographics), and DR with RIS integration. This study was performed at the Cleveland Clinic Foundation (Cleveland, OH) using a newly acquired amorphous silicon flat-panel DR system from Canon Medical Systems (Irvine, CA). Our data show that DR without RIS results in substantial workflow savings over traditional film/screen practice. There is an additional 30% reduction in total examination time using DR with RIS integration.

  20. Integrating configuration workflows with project management system

    NASA Astrophysics Data System (ADS)

    Nilsen, Dimitri; Weber, Pavel

    2014-06-01

    The complexity of the heterogeneous computing resources, services and recurring infrastructure changes at the GridKa WLCG Tier-1 computing center require a structured approach to configuration management and optimization of interplay between functional components of the whole system. A set of tools deployed at GridKa, including Puppet, Redmine, Foreman, SVN and Icinga, provides the administrative environment giving the possibility to define and develop configuration workflows, reduce the administrative effort and improve sustainable operation of the whole computing center. In this presentation we discuss the developed configuration scenarios implemented at GridKa, which we use for host installation, service deployment, change management procedures, service retirement etc. The integration of Puppet with a project management tool like Redmine provides us with the opportunity to track problem issues, organize tasks and automate these workflows. The interaction between Puppet and Redmine results in automatic updates of the issues related to the executed workflow performed by different system components. The extensive configuration workflows require collaboration and interaction between different departments like network, security, production etc. at GridKa. Redmine plugins developed at GridKa and integrated in its administrative environment provide an effective way of collaboration within the GridKa team. We present the structural overview of the software components, their connections, communication protocols and show a few working examples of the workflows and their automation.

  1. Integrating text mining into the MGI biocuration workflow

    PubMed Central

    Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  2. Integrating text mining into the MGI biocuration workflow.

    PubMed

    Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  3. Bridging the OA Data Processing and Quality Control Workflow Gap

    NASA Astrophysics Data System (ADS)

    Burger, E. F.; O'Brien, K.; Smith, K. M.; Schweitzer, R.; Manke, A. B.; Jiang, L.

    2016-02-01

    To effectively use data collected by the ocean acidification community for analysis and synthesis product generation, it is desirable that the data are quality controlled, documented, and accessible by the applications scientists prefer to use. The processing requirements, increases in data volume, require a significant effort by OAP collaborators as second level data processing and quality control is time-consuming. Federal and NOAA data directives now require our scientific data to be documented, publically available and archived in two years or less, further adding to the scientists' data management burden. Time spent on these data processing activities reduces the resources available to scientists to perform their research. This data-workflow gap between initial data processing, known as level one processing, and National Data Center submission of contextual quality controlled data, has not been addressed for a significant amount of OA data. We propose tools and processes that will streamline OA data processing and contextual quality control. This vision suggests a solution that relies on a combination of extending existing development and new development on tools that will allow users to span this data workflow gap; to streamline the processing, quality control, and archive submission of biogeochemical OA data and metadata. Workflow established by this software will reduce the data management burden for scientists while also creating quality controlled data in interoperable standards-based formats that promote easier use of the high-value data. Time savings gained by this streamlined data processing will also allow scientists to meet their obligations for data submission to the National Data Centers. This talk will present this vision and highlight the existing applications and tools that, if extended, can meet the requirements at a much reduced development cost.

  4. Open innovation: Towards sharing of data, models and workflows.

    PubMed

    Conrado, Daniela J; Karlsson, Mats O; Romero, Klaus; Sarr, Céline; Wilkins, Justin J

    2017-07-04

    Sharing of resources across organisations to support open innovation is an old idea, but which is being taken up by the scientific community at increasing speed, concerning public sharing in particular. The ability to address new questions or provide more precise answers to old questions through merged information is among the attractive features of sharing. Increased efficiency through reuse, and increased reliability of scientific findings through enhanced transparency, are expected outcomes from sharing. In the field of pharmacometrics, efforts to publicly share data, models and workflow have recently started. Sharing of individual-level longitudinal data for modelling requires solving legal, ethical and proprietary issues similar to many other fields, but there are also pharmacometric-specific aspects regarding data formats, exchange standards, and database properties. Several organisations (CDISC, C-Path, IMI, ISoP) are working to solve these issues and propose standards. There are also a number of initiatives aimed at collecting disease-specific databases - Alzheimer's Disease (ADNI, CAMD), malaria (WWARN), oncology (PDS), Parkinson's Disease (PPMI), tuberculosis (CPTR, TB-PACTS, ReSeqTB) - suitable for drug-disease modelling. Organized sharing of pharmacometric executable model code and associated information has in the past been sparse, but a model repository (DDMoRe Model Repository) intended for the purpose has recently been launched. In addition several other services can facilitate model sharing more generally. Pharmacometric workflows have matured over the last decades and initiatives to more fully capture those applied to analyses are ongoing. In order to maximize both the impact of pharmacometrics and the knowledge extracted from clinical data, the scientific community needs to take ownership of and create opportunities for open innovation. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Building interoperable health information systems using agent and workflow technologies.

    PubMed

    Koufi, Vassiliki; Malamateniou, Flora; Vassilacopoulos, George

    2009-01-01

    Healthcare is an increasingly collaborative enterprise involving many individuals and organizations that coordinate their efforts toward promoting quality and efficient delivery of healthcare through the use of interoperable healthcare information systems. This paper presents a mediator-based approach for achieving data and service interoperability among disparate and geographically dispersed healthcare information systems. The proposed system architecture enables decoupling of the client applications and the server-side implementations while it ensures security in all transactions. It is a distributed system architecture based on the agent-oriented paradigm for communication and life cycle management while interactions are described according to the workflow metaphor. Thus robustness, high flexibility and fault tolerance are provided in an environment as dynamic and heterogeneous as healthcare.

  6. Impact of CGNS on CFD Workflow

    NASA Technical Reports Server (NTRS)

    Poinot, M.; Rumsey, C. L.; Mani, M.

    2004-01-01

    CFD tools are an integral part of industrial and research processes, for which the amount of data is increasing at a high rate. These data are used in a multi-disciplinary fluid dynamics environment, including structural, thermal, chemical or even electrical topics. We show that the data specification is an important challenge that must be tackled to achieve an efficient workflow for use in this environment. We compare the process with other software techniques, such as network or database type, where past experiences showed how difficult it was to bridge the gap between completely general specifications and dedicated specific applications. We show two aspects of the use of CFD General Notation System (CGNS) that impact CFD workflow: as a data specification framework and as a data storage means. Then, we give examples of projects involving CFD workflows where the use of the CGNS standard leads to a useful method either for data specification, exchange, or storage.

  7. Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2014-12-01

    The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent Cyber

  8. A Semi-Automated Workflow Solution for Data Set Publication

    DOE PAGES

    Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; ...

    2016-03-08

    In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less

  9. A Semi-Automated Workflow Solution for Data Set Publication

    SciTech Connect

    Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; Wright, Daine M.; Devarakonda, Ranjeet; Wei, Yaxing; Hook, Les A.; McMurry, Benjamin F.

    2016-03-08

    In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals with data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.

  10. Scientific rigor through videogames.

    PubMed

    Treuille, Adrien; Das, Rhiju

    2014-11-01

    Hypothesis-driven experimentation - the scientific method - can be subverted by fraud, irreproducibility, and lack of rigorous predictive tests. A robust solution to these problems may be the 'massive open laboratory' model, recently embodied in the internet-scale videogame EteRNA. Deploying similar platforms throughout biology could enforce the scientific method more broadly.

  11. Edge kinetic-MHD code coupling and monitoring with Kepler workflow

    NASA Astrophysics Data System (ADS)

    Cummings, Julian; Klasky, Scott; Barreto, Roselyne; Podhorszki, Norbert; Park, Gunyoung; Chang, C. S.; Sugiyama, Linda; Snyder, Phil

    2007-11-01

    Simulations of edge pressure pedestal buildup and ELM crash in a typical DIII-D H-mode discharge are performed using Kepler, an open-source scientific workflow system that manages complex applications. A Kepler workflow conducts an edge plasma simulation that loosely couples the kinetic code XGC0 with an ideal MHD linear stability analysis code ELITE and a two-fluid MHD initial value code M3D. XGC0 simulation data are processed by the workflow into simple graphs that may be selectively displayed via the Dashboard, a monitoring tool that allows real-time data tracking within a standard Web browser. Kepler runs ELITE to assess plasma profiles from XGC0 for linear ELM instability. If unstable, Kepler launches M3D to simulate the nonlinear ELM crash. Periodic outputs of plasma fluid quantities are automatically imaged and may be displayed on the Dashboard. Finally, Kepler archives all simulation output, processed images, and provenance tracking data. Preparation, execution, and monitoring of this coupled-code simulation using the Kepler scientific workflow system are described.

  12. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks

    PubMed Central

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G.; Hogg, David C.; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  13. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks.

    PubMed

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G; Hogg, David C; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user's pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  14. Reproducibility of computational workflows is automated using continuous analysis.

    PubMed

    Beaulieu-Jones, Brett K; Greene, Casey S

    2017-03-13

    Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared.

  15. Text mining for the biocuration workflow.

    PubMed

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  16. Workflow Automation: A Collective Case Study

    ERIC Educational Resources Information Center

    Harlan, Jennifer

    2013-01-01

    Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…

  17. A Workflow to Investigate Exposure and Pharmacokinetic ...

    EPA Pesticide Factsheets

    Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowledge on exposure, along with absorption, distribution, metabolism, and excretion (ADME) properties of chemicals.Objectives: We developed a conceptual workflow to examine exposure and ADME properties in relation to an MIE. The utility of this workflow was evaluated using a previously established AOP, acetylcholinesterase (AChE) inhibition.Methods: Thirty chemicals found to inhibit human AChE in the ToxCast™ assay were examined with respect to their exposure, absorption potential, and ability to cross the blood–brain barrier (BBB). Structures of active chemicals were compared against structures of 1,029 inactive chemicals to detect possible parent compounds that might have active metabolites.Results: Application of the workflow screened 10 “low-priority” chemicals of 30 active chemicals. Fifty-two of the 1,029 inactive chemicals exhibited a similarity threshold of ≥ 75% with their nearest active neighbors. Of these 52 compounds, 30 were excluded due to poor absorption or distribution. The remaining 22 compounds may inhibit AChE in vivo either directly or as a result of metabolic activation.Conclusions: The incorporation of exposure and ADME properties into the conceptual workflow e

  18. Conventions and workflows for using Situs

    SciTech Connect

    Wriggers, Willy

    2012-04-01

    Recent developments of the Situs software suite for multi-scale modeling are reviewed. Typical workflows and conventions encountered during processing of biophysical data from electron microscopy, tomography or small-angle X-ray scattering are described. Situs is a modular program package for the multi-scale modeling of atomic resolution structures and low-resolution biophysical data from electron microscopy, tomography or small-angle X-ray scattering. This article provides an overview of recent developments in the Situs package, with an emphasis on workflows and conventions that are important for practical applications. The modular design of the programs facilitates scripting in the bash shell that allows specific programs to be combined in creative ways that go beyond the original intent of the developers. Several scripting-enabled functionalities, such as flexible transformations of data type, the use of symmetry constraints or the creation of two-dimensional projection images, are described. The processing of low-resolution biophysical maps in such workflows follows not only first principles but often relies on implicit conventions. Situs conventions related to map formats, resolution, correlation functions and feature detection are reviewed and summarized. The compatibility of the Situs workflow with CCP4 conventions and programs is discussed.

  19. Building Digital Audio Preservation Infrastructure and Workflows

    ERIC Educational Resources Information Center

    Young, Anjanette; Olivieri, Blynne; Eckler, Karl; Gerontakos, Theodore

    2010-01-01

    In 2009 the University of Washington (UW) Libraries special collections received funding for the digital preservation of its audio indigenous language holdings. The university libraries, where the authors work in various capacities, had begun digitizing image and text collections in 1997. Because of this, at the onset of the project, workflows (a…

  20. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  1. A Workflow to Investigate Exposure and Pharmacokinetic ...

    EPA Pesticide Factsheets

    Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowledge on exposure, along with absorption, distribution, metabolism, and excretion (ADME) properties of chemicals.Objectives: We developed a conceptual workflow to examine exposure and ADME properties in relation to an MIE. The utility of this workflow was evaluated using a previously established AOP, acetylcholinesterase (AChE) inhibition.Methods: Thirty chemicals found to inhibit human AChE in the ToxCast™ assay were examined with respect to their exposure, absorption potential, and ability to cross the blood–brain barrier (BBB). Structures of active chemicals were compared against structures of 1,029 inactive chemicals to detect possible parent compounds that might have active metabolites.Results: Application of the workflow screened 10 “low-priority” chemicals of 30 active chemicals. Fifty-two of the 1,029 inactive chemicals exhibited a similarity threshold of ≥ 75% with their nearest active neighbors. Of these 52 compounds, 30 were excluded due to poor absorption or distribution. The remaining 22 compounds may inhibit AChE in vivo either directly or as a result of metabolic activation.Conclusions: The incorporation of exposure and ADME properties into the conceptual workflow e

  2. Scalable Knowledge Discovery Through Grid Workflows

    DTIC Science & Technology

    2009-04-01

    22-26, 2007. 94 16. Gil, Yolanda, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble , Miron Livny, Luc Moreau...P., Goble , C.A.: Workflow discovery: the problem, a case study from e-science and a graph-based solution. In: ICWS, IEEE Computer Society (2006) 312

  3. Workflow Automation: A Collective Case Study

    ERIC Educational Resources Information Center

    Harlan, Jennifer

    2013-01-01

    Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…

  4. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu.

  5. Building Digital Audio Preservation Infrastructure and Workflows

    ERIC Educational Resources Information Center

    Young, Anjanette; Olivieri, Blynne; Eckler, Karl; Gerontakos, Theodore

    2010-01-01

    In 2009 the University of Washington (UW) Libraries special collections received funding for the digital preservation of its audio indigenous language holdings. The university libraries, where the authors work in various capacities, had begun digitizing image and text collections in 1997. Because of this, at the onset of the project, workflows (a…

  6. RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service

    NASA Astrophysics Data System (ADS)

    Yang, Chao; Chen, Nengcheng; Di, Liping

    2012-10-01

    Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.

  7. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    SciTech Connect

    Tsai, Yingssu; McPhillips, Scott E.; González, Ana; McPhillips, Timothy M.; Zinn, Daniel; Cohen, Aina E.; Feese, Michael D.; Bushnell, David; Tiefenbrunn, Theresa; Stout, C. David; Ludaescher, Bertram; Hedman, Britt; Hodgson, Keith O.; Soltis, S. Michael

    2013-05-01

    New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully

  8. SegMine workflows for semantic microarray data analysis in Orange4WS

    PubMed Central

    2011-01-01

    Background In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, knowledge discovery from diverse distributed data and knowledge sources (such as GO, KEGG, PubMed, and experimental databases). Specifically, cutting-edge data analysis approaches, such as semantic data mining, link discovery, and visualization, have not yet been made available to researchers investigating complex biological datasets. Results We present a new methodology, SegMine, for semantic analysis of microarray data by exploiting general biological knowledge, and a new workflow environment, Orange4WS, with integrated support for web services in which the SegMine methodology is implemented. The SegMine methodology consists of two main steps. First, the semantic subgroup discovery algorithm is used to construct elaborate rules that identify enriched gene sets. Then, a link discovery service is used for the creation and visualization of new biological hypotheses. The utility of SegMine, implemented as a set of workflows in Orange4WS, is demonstrated in two microarray data analysis applications. In the analysis of senescence in human stem cells, the use of SegMine resulted in three novel research hypotheses that could improve understanding of the underlying mechanisms of senescence and identification of candidate marker genes. Conclusions Compared to the available data analysis systems, SegMine offers improved hypothesis generation and data interpretation for bioinformatics in an easy-to-use integrated workflow environment. PMID:22029475

  9. SegMine workflows for semantic microarray data analysis in Orange4WS.

    PubMed

    Podpečan, Vid; Lavrač, Nada; Mozetič, Igor; Novak, Petra Kralj; Trajkovski, Igor; Langohr, Laura; Kulovesi, Kimmo; Toivonen, Hannu; Petek, Marko; Motaln, Helena; Gruden, Kristina

    2011-10-26

    In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, knowledge discovery from diverse distributed data and knowledge sources (such as GO, KEGG, PubMed, and experimental databases). Specifically, cutting-edge data analysis approaches, such as semantic data mining, link discovery, and visualization, have not yet been made available to researchers investigating complex biological datasets. We present a new methodology, SegMine, for semantic analysis of microarray data by exploiting general biological knowledge, and a new workflow environment, Orange4WS, with integrated support for web services in which the SegMine methodology is implemented. The SegMine methodology consists of two main steps. First, the semantic subgroup discovery algorithm is used to construct elaborate rules that identify enriched gene sets. Then, a link discovery service is used for the creation and visualization of new biological hypotheses. The utility of SegMine, implemented as a set of workflows in Orange4WS, is demonstrated in two microarray data analysis applications. In the analysis of senescence in human stem cells, the use of SegMine resulted in three novel research hypotheses that could improve understanding of the underlying mechanisms of senescence and identification of candidate marker genes. Compared to the available data analysis systems, SegMine offers improved hypothesis generation and data interpretation for bioinformatics in an easy-to-use integrated workflow environment.

  10. Agent-Based Workflow Systems in Electronic Distance Education.

    ERIC Educational Resources Information Center

    Dlodlo, Nomusa; Dlodlo, Joseph B.; Masiye, Bighton S.

    Current workflow systems largely assume a closed network where all the software is available on a homogenous platform and all participants are locally linked together at the same time. The field of Electronic Distance Education (EDE) on the other hand, requires the next-generation workflow that will integrate workflows from a distributed…

  11. Quantitative Regression Models for the Prediction of Chemical Properties by an Efficient Workflow.

    PubMed

    Yin, Yongmin; Xu, Congying; Gu, Shikai; Li, Weihua; Liu, Guixia; Tang, Yun

    2015-10-01

    Rapid safety assessment is more and more needed for the increasing chemicals both in chemical industries and regulators around the world. The traditional experimental methods couldn't meet the current demand any more. With the development of the information technology and the growth of experimental data, in silico modeling has become a practical and rapid alternative for the assessment of chemical properties, especially for the toxicity prediction of organic chemicals. In this study, a quantitative regression workflow was built by KNIME to predict chemical properties. With this regression workflow, quantitative values of chemical properties can be obtained, which is different from the binary-classification model or multi-classification models that can only give qualitative results. To illustrate the usage of the workflow, two predictive models were constructed based on datasets of Tetrahymena pyriformis toxicity and Aqueous solubility. The qcv (2) and qtest (2) of 5-fold cross validation and external validation for both types of models were greater than 0.7, which implies that our models are robust and reliable, and the workflow is very convenient and efficient in prediction of various chemical properties. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Grid workflow validation using ontology-based tacit knowledge: A case study for quantitative remote sensing applications

    NASA Astrophysics Data System (ADS)

    Liu, Jia; Liu, Longli; Xue, Yong; Dong, Jing; Hu, Yingcui; Hill, Richard; Guang, Jie; Li, Chi

    2017-01-01

    Workflow for remote sensing quantitative retrieval is the ;bridge; between Grid services and Grid-enabled application of remote sensing quantitative retrieval. Workflow averts low-level implementation details of the Grid and hence enables users to focus on higher levels of application. The workflow for remote sensing quantitative retrieval plays an important role in remote sensing Grid and Cloud computing services, which can support the modelling, construction and implementation of large-scale complicated applications of remote sensing science. The validation of workflow is important in order to support the large-scale sophisticated scientific computation processes with enhanced performance and to minimize potential waste of time and resources. To research the semantic correctness of user-defined workflows, in this paper, we propose a workflow validation method based on tacit knowledge research in the remote sensing domain. We first discuss the remote sensing model and metadata. Through detailed analysis, we then discuss the method of extracting the domain tacit knowledge and expressing the knowledge with ontology. Additionally, we construct the domain ontology with Protégé. Through our experimental study, we verify the validity of this method in two ways, namely data source consistency error validation and parameters matching error validation.

  13. Data Integration Tool: From Permafrost Data Translation Research Tool to A Robust Research Application

    NASA Astrophysics Data System (ADS)

    Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Strawhacker, C.; Pulsifer, P. L.; Thurmes, N.

    2016-12-01

    The United States National Science Foundation funded PermaData project led by the National Snow and Ice Data Center (NSIDC) with a team from the Global Terrestrial Network for Permafrost (GTN-P) aimed to improve permafrost data access and discovery. We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the GTN-P. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets. Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs. Originally it was written to capture a scientist's personal, iterative, data manipulation and quality control process of visually and programmatically iterating through inconsistent input data, examining it to find problems, adding operations to address the problems, and rerunning until the data could be translated into the GTN-P standard format. Iterative development of this tool led to a Fortran/Python hybrid then, with consideration of users, licensing, version control, packaging, and workflow, to a publically available, robust, usable application. Transitioning to Python allowed the use of open source frameworks for the workflow core and integration with a javascript graphical workflow interface. DIT is targeted to automatically handle 90% of the data processing for field scientists, modelers, and non-discipline scientists. It is available as an open source tool in GitHub packaged for a subset of Mac, Windows, and UNIX systems as a desktop application with a graphical workflow manager. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets

  14. Engineering robust intelligent robots

    NASA Astrophysics Data System (ADS)

    Hall, E. L.; Ali, S. M. Alhaj; Ghaffari, M.; Liao, X.; Cao, M.

    2010-01-01

    The purpose of this paper is to discuss the challenge of engineering robust intelligent robots. Robust intelligent robots may be considered as ones that not only work in one environment but rather in all types of situations and conditions. Our past work has described sensors for intelligent robots that permit adaptation to changes in the environment. We have also described the combination of these sensors with a "creative controller" that permits adaptive critic, neural network learning, and a dynamic database that permits task selection and criteria adjustment. However, the emphasis of this paper is on engineering solutions which are designed for robust operations and worst case situations such as day night cameras or rain and snow solutions. This ideal model may be compared to various approaches that have been implemented on "production vehicles and equipment" using Ethernet, CAN Bus and JAUS architectures and to modern, embedded, mobile computing architectures. Many prototype intelligent robots have been developed and demonstrated in terms of scientific feasibility but few have reached the stage of a robust engineering solution. Continual innovation and improvement are still required. The significance of this comparison is that it provides some insights that may be useful in designing future robots for various manufacturing, medical, and defense applications where robust and reliable performance is essential.

  15. geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

    NASA Astrophysics Data System (ADS)

    Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

    2015-12-01

    The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from

  16. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-12-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of

  17. Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization

    PubMed Central

    Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley

    2015-01-01

    Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173

  18. Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization.

    PubMed

    Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley

    Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients.

  19. Grid workflow job execution service 'Pilot'

    NASA Astrophysics Data System (ADS)

    Shamardin, Lev; Kryukov, Alexander; Demichev, Andrey; Ilyin, Vyacheslav

    2011-12-01

    'Pilot' is a grid job execution service for workflow jobs. The main goal for the service is to automate computations with multiple stages since they can be expressed as simple workflows. Each job is a directed acyclic graph of tasks and each task is an execution of something on a grid resource (or 'computing element'). Tasks may be submitted to any WS-GRAM (Globus Toolkit 4) service. The target resources for the tasks execution are selected by the Pilot service from the set of available resources which match the specific requirements from the task and/or job definition. Some simple conditional execution logic is also provided. The 'Pilot' service is built on the REST concepts and provides a simple API through authenticated HTTPS. This service is deployed and used in production in a Russian national grid project GridNNN.

  20. Clinic Workflow Simulations using Secondary EHR Data

    PubMed Central

    Hribar, Michelle R.; Biermann, David; Read-Brown, Sarah; Reznick, Leah; Lombardi, Lorinna; Parikh, Mansi; Chamberlain, Winston; Yackel, Thomas R.; Chiang, Michael F.

    2016-01-01

    Clinicians today face increased patient loads, decreased reimbursements and potential negative productivity impacts of using electronic health records (EHR), but have little guidance on how to improve clinic efficiency. Discrete event simulation models are powerful tools for evaluating clinical workflow and improving efficiency, particularly when they are built from secondary EHR timing data. The purpose of this study is to demonstrate that these simulation models can be used for resource allocation decision making as well as for evaluating novel scheduling strategies in outpatient ophthalmology clinics. Key findings from this study are that: 1) secondary use of EHR timestamp data in simulation models represents clinic workflow, 2) simulations provide insight into the best allocation of resources in a clinic, 3) simulations provide critical information for schedule creation and decision making by clinic managers, and 4) simulation models built from EHR data are potentially generalizable. PMID:28269861

  1. Advances in proteomic workflows for systems biology

    PubMed Central

    Malmström, Johan; Lee, Hookeun; Aebersold, Ruedi

    2007-01-01

    Summary and recent advances Mass spectrometry, specifically the analysis of complex peptide mixtures by liquid chromatography and tandem mass spectrometry (shotgun proteomics) has been at the center of proteomics research for the last decade. To overcome some of the fundamental limitations of the approach, including its limited sensitivity and high degree of redundancy, new proteomics workflows are being developed. Among these, targeting methods in which specific peptides are selectively isolated, identified and quantified are particularly promising. Here we summarize recent incremental advances in shotgun proteomics methods and outline emerging targeted workflows. The development of the target driven approaches with their ability to detect and quantify identical, non-redundant sets of proteins in multiple repeat analyses will be critically important for the application of proteomics to biomarker discovery and validation, and to systems biology research. PMID:17698335

  2. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.

  3. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

    PubMed Central

    Stockton, David B.; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175

  4. IDD Archival Hardware Architecture and Workflow

    SciTech Connect

    Mendonsa, D; Nekoogar, F; Martz, H

    2008-10-09

    This document describes the functionality of every component in the DHS/IDD archival and storage hardware system shown in Fig. 1. The document describes steps by step process of image data being received at LLNL then being processed and made available to authorized personnel and collaborators. Throughout this document references will be made to one of two figures, Fig. 1 describing the elements of the architecture and the Fig. 2 describing the workflow and how the project utilizes the available hardware.

  5. Workflow and cost analysis on MODULAR ANALYTICS.

    PubMed

    Stolz, Herbert; Dossler, Bettina; Keller, Franz; Steigerwald, Udo

    2003-01-01

    Four stand-alone analyzers in a centralized laboratory were replaced by two modular analytical systems processing 45 methods of the general chemistry and specific protein segment. This consolidation led to a reduction of the daily workflow and operational costs. The cost saving with 1.3 million reported results per year was 53,000 Euro, which can be assessed as an important contribution to cost reduction in the health care system.

  6. Some Systematic Issues for RGB Workflow

    NASA Astrophysics Data System (ADS)

    Matsuki, Makoto

    According to recent increase of RGB data input from digital camera to press printing workflow, color and image reproduction incidents are becoming very remarkable. In order to identify these issues and solving methods, we held a joint workshop of four image science related societies "Color reproduction issues from digital camera to hardcopy—its ideal and real—". This special issue is constructed by papers based on presentations in this workshop. This paper describes its intent and background.

  7. Quantifying nursing workflow in medication administration.

    PubMed

    Keohane, Carol A; Bane, Anne D; Featherstone, Erica; Hayes, Judy; Woolf, Seth; Hurley, Ann; Bates, David W; Gandhi, Tejal K; Poon, Eric G

    2008-01-01

    New medication administration systems are showing promise in improving patient safety at the point of care, but adoption of these systems requires significant changes in nursing workflow. To prepare for these changes, the authors report on a time-motion study that measured the proportion of time that nurses spend on various patient care activities, focusing on medication administration-related activities. Implications of their findings are discussed.

  8. Computing Workflows for Biologists: A Roadmap.

    PubMed

    Shade, Ashley; Teal, Tracy K

    2015-01-01

    Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research.

  9. Computing Workflows for Biologists: A Roadmap

    PubMed Central

    Shade, Ashley; Teal, Tracy K.

    2015-01-01

    Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research. PMID:26600012

  10. Schedule-Aware Workflow Management Systems

    NASA Astrophysics Data System (ADS)

    Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

    Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.

  11. Conventions and workflows for using Situs

    PubMed Central

    Wriggers, Willy

    2012-01-01

    Situs is a modular program package for the multi-scale modeling of atomic resolution structures and low-resolution biophysical data from electron microscopy, tomography or small-angle X-ray scattering. This article provides an overview of recent developments in the Situs package, with an emphasis on workflows and conventions that are important for practical applications. The modular design of the programs facilitates scripting in the bash shell that allows specific programs to be combined in creative ways that go beyond the original intent of the developers. Several scripting-enabled functionalities, such as flexible transformations of data type, the use of symmetry constraints or the creation of two-dimensional projection images, are described. The processing of low-resolution biophysical maps in such workflows follows not only first principles but often relies on implicit conventions. Situs conventions related to map formats, resolution, correlation functions and feature detection are reviewed and summarized. The compatibility of the Situs workflow with CCP4 conventions and programs is discussed. PMID:22505255

  12. From chart tracking to workflow management.

    PubMed Central

    Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.

    1994-01-01

    The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051

  13. Tiered approach into practice: scientific validation for chromatography-based assays in early development - a recommendation from the European Bioanalysis Forum.

    PubMed

    Timmerman, Philip; White, Stephen; Dougall, Stuart Mc; Kall, Morten A; Smeraglia, John; Fjording, Marianne Scheel; Knutsson, Magnus

    2015-09-10

    The principles of tiered approach have been part of the bioanalytical toolbox for some years. Nevertheless, an in spite of many valuable discussions in industry, they remain difficult to apply in a harmonized way for a broad array of studies in early drug development where these alternative approaches to regulated validation would make sense. The European Bioanalysis Forum has identified the need to proposes some practical workflows for five categories of studies for chromatography based assays where scientific validation will allow additional freedom while safeguarding scientific rigor and robust documentation: quantification of metabolites in plasma in relation to ICH M3(R2), urine analysis, tissue homogenate analysis, and preclinical and clinical studies in early stages of drug development. The recommendation would introduce a common language and harmonized best practice for these study categories and can help to refocus towards optimized scientific and resource investments for bioanalysis in early drug development.

  14. Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

    PubMed

    Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

    2017-07-12

    Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Robust Nanoparticles

    DTIC Science & Technology

    2015-01-21

    advanced the chemis1:Iy of ftmctional nanopruiicles and used these patiicles in advanced materials assembly for the fabrication of nanopatiicle...polymer ligands, and the robustness resulting fi:om ligand cross-linking post- assembly . The project developed a facile evaporative assembly method...used these particles in advanced materials assembly for the fabrication of nanoparticle-based mesostructures. These hybrid materials possess extremely

  16. Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics

    NASA Astrophysics Data System (ADS)

    Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.

    2015-12-01

    Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in Net

  17. An efficient field and laboratory workflow for plant phylotranscriptomic projects1

    PubMed Central

    Yang, Ya; Moore, Michael J.; Brockington, Samuel F.; Timoneda, Alfonso; Feng, Tao; Marx, Hannah E.; Walker, Joseph F.; Smith, Stephen A.

    2017-01-01

    Premise of the study: We describe a field and laboratory workflow developed for plant phylotranscriptomic projects that involves cryogenic tissue collection in the field, RNA extraction and quality control, and library preparation. We also make recommendations for sample curation. Methods and Results: A total of 216 frozen tissue samples of Caryophyllales and other angiosperm taxa were collected from the field or botanical gardens. RNA was extracted, stranded mRNA libraries were prepared, and libraries were sequenced on Illumina HiSeq platforms. These included difficult mucilaginous tissues such as those of Cactaceae and Droseraceae. Conclusions: Our workflow is not only cost effective (ca. $270 per sample, as of August 2016, from tissue to reads) and time efficient (less than 50 h for 10–12 samples including all laboratory work and sample curation), but also has proven robust for extraction of difficult samples such as tissues containing high levels of secondary compounds. PMID:28337391

  18. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support

    PubMed Central

    2012-01-01

    Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system

  19. Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

    PubMed

    Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa

    2012-05-04

    Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a

  20. Wildfire: distributed, Grid-enabled workflow construction and execution.

    PubMed

    Tang, Francis; Chua, Ching Lian; Ho, Liang-Yoong; Lim, Yun Ping; Issac, Praveen; Krishnan, Arun

    2005-03-24

    We observe two trends in bioinformatics: (i) analyses are increasing in complexity, often requiring several applications to be run as a workflow; and (ii) multiple CPU clusters and Grids are available to more scientists. The traditional solution to the problem of running workflows across multiple CPUs required programming, often in a scripting language such as perl. Programming places such solutions beyond the reach of many bioinformatics consumers. We present Wildfire, a graphical user interface for constructing and running workflows. Wildfire borrows user interface features from Jemboss and adds a drag-and-drop interface allowing the user to compose EMBOSS (and other) programs into workflows. For execution, Wildfire uses GEL, the underlying workflow execution engine, which can exploit available parallelism on multiple CPU machines including Beowulf-class clusters and Grids. Wildfire simplifies the tasks of constructing and executing bioinformatics workflows.

  1. Wildfire: distributed, Grid-enabled workflow construction and execution

    PubMed Central

    Tang, Francis; Chua, Ching Lian; Ho, Liang-Yoong; Lim, Yun Ping; Issac, Praveen; Krishnan, Arun

    2005-01-01

    Background We observe two trends in bioinformatics: (i) analyses are increasing in complexity, often requiring several applications to be run as a workflow; and (ii) multiple CPU clusters and Grids are available to more scientists. The traditional solution to the problem of running workflows across multiple CPUs required programming, often in a scripting language such as perl. Programming places such solutions beyond the reach of many bioinformatics consumers. Results We present Wildfire, a graphical user interface for constructing and running workflows. Wildfire borrows user interface features from Jemboss and adds a drag-and-drop interface allowing the user to compose EMBOSS (and other) programs into workflows. For execution, Wildfire uses GEL, the underlying workflow execution engine, which can exploit available parallelism on multiple CPU machines including Beowulf-class clusters and Grids. Conclusion Wildfire simplifies the tasks of constructing and executing bioinformatics workflows. PMID:15788106

  2. Text mining for the biocuration workflow

    PubMed Central

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  3. Assessment of workflow redesign in community pharmacy.

    PubMed

    Angelo, Lauren B; Ferreri, Stefanie P

    2005-01-01

    To assess the effect that workflow enhancements have on dispensing responsibilities and pharmacist-patient interaction in the community pharmacy setting. Pre-post comparison. Pre-assessment data were obtained from a multisite observational study. Pharmacy within a regional pharmacy chain. 3 pharmacists and 110 patients. The pharmacy was physically remodeled to enable workflow changes, including defining dispensing responsibilities with an emphasis on patient counseling, providing an additional 6 feet of counter space, upgrading technology, installing a third computer, implementing tools to augment the filling process, and requesting that cashiers rephrase the offer to counsel to encourage patient acceptance. Patients and pharmacists were surveyed about the experiences and beliefs, and pharmacy activities were observed directly. Patient counseling and prescription dispensing. activities. The number of pharmacists who perceived that they had adequate time to counsel patients increased as a result of the intervention (0 of 3 responding pharmacists before the intervention, compared with 2 of 2 afterward). Patient satisfaction scores both before and after the intervention were predominantly favorable and did not differ significantly. The most relevant change in dispensing activities was pharmacist involvement with data entry into the computer, which decreased from 61% to 10%. Oral counseling offers to patients increased significantly, from 5% to 85%, but counseling rates remained low throughout the study and were not measurably affected by workload. Workflow redesign has positively affected the dispensing activities at the study site. Technicians took more responsibility for dispensing tasks. Given the drastic increase in counseling offers but lack of effect on counseling rates, patient behavior and expectations with regard to counseling likely need to change to further improve dynamics in the community pharmacy.

  4. Automatic Provenance Recording for Scientific Data using Trident

    NASA Astrophysics Data System (ADS)

    Simmhan, Y.; Barga, R.; van Ingen, C.

    2008-12-01

    Provenance is increasingly recognized as being critical to the understanding and reuse of scientific datasets. Given the rapid generation of scientific data from sensors and computational model results, it is not practical to manually record provenance for data and automated techniques for provenance capture are essential. Scientific workflows provide a framework for representing computational models and complex transformations of scientific data, and present a means for tracking the operations performed to derive a dataset. The Trident Scientific Workbench is a workflow system that natively incorporates provenance capture of data derived as part of the workflow execution. The applications used as part of a Trident workflow can execute on a remote computational cluster, such as a supercomputing center on in the Cloud, or on the local desktop of the researcher, and provenance on data derived by the applications is seamlessly captured. Scientists also have the option to annotate the provenance metadata using domain specific tags such as, for example, GCMD keywords. The provenance records thus captured can be exported in the Open Provenance Model* XML format that is emerging as a provenance standard in the eScience community or visualized as a graph of data and applications. The Trident workflow system and provenance recorded by it has been successfully applied in the Neptune oceanography project and is presently being tested in the Pan-STARRS astronomy project. *http://twiki.ipaw.info/bin/view/Challenge/OPM

  5. Toward Exascale Seismic Imaging: Taming Workflow and I/O Issues

    NASA Astrophysics Data System (ADS)

    Lefebvre, M. P.; Bozdag, E.; Lei, W.; Rusmanugroho, H.; Smith, J. A.; Tromp, J.; Yuan, Y.

    2013-12-01

    Providing a better understanding of the physics and chemistry of Earth's interior through numerical simulations has always required tremendous computational resources. Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns on how to obtain optimum performance. Several issues are currently being investigated by the HPC community. To name a few, we can list energy consumption, fault resilience, scalability of the current parallel paradigms, large workflow management, I/O performance and feature extraction with large datasets. For this presentation, we focus on the last three issues. In the context of seismic imaging, in particular for simulations based on adjoint methods, workflows are well defined. They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts composing it. The usual approach is to speedup the purely computational parts by code tuning in order to reach higher FLOPS and better memory usage. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from a severe I/O bottleneck. This limitation occurs both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). In both cases, a parallel I/O library, ORNL's ADIOS, is used to drastically lessen the weight of disk access. Moreover, parallel visualization tools, such as VisIt, are able to take advantage of the metadata included in our ADIOS outputs to extract features and

  6. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    PubMed

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  7. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

    PubMed Central

    Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  8. Smart tools manage digital imagery access and workflow

    NASA Astrophysics Data System (ADS)

    Buzi, Miriam; LaFramboise, William A.

    2000-05-01

    Lockheed Martin's Intelligent Library System (ILS)TM imagery management solution was originally developed for users and distributors of Earth imagery emanating from commercial remote sensing satellites or aircraft. The product is a total hardware and software solution comprised of two main components: SmartArchiverTM digital asset management system and SmartAnalystTM imagery exploration tools. While investigating the latest technologies and developing Intelligent Library System (ILS)TM as a state-of-the-art system, we realized SmartArchiver systems offered robust functionality not available elsewhere for handling large medical imagery files. The SmartArchiver system's features answer the following needs of medical imagery handling: smooth handling of large individual imagery files; easy access to specific imagery or types of imagery; cost-effective storage of historical data and protection of imagery over time; ability to grow an archive to thousands of terabytes; distribution from a central archive to multiple viewing sites; varying levels of resolutions requirements at the viewing stations; strict multi-level security adherence; and automated workflow management. In this paper we detail the features of the system and how they apply to medical imagery management. We also describe how a medical application can be served by the SmartArchiver asset management system.

  9. Automated workflow for large-scale selected reaction monitoring experiments.

    PubMed

    Malmström, Lars; Malmström, Johan; Selevsek, Nathalie; Rosenberger, George; Aebersold, Ruedi

    2012-03-02

    Targeted proteomics allows researchers to study proteins of interest without being drowned in data from other, less interesting proteins or from redundant or uninformative peptides. While the technique is mostly used for smaller, focused studies, there are several reasons to conduct larger targeted experiments. Automated, highly robust software becomes more important in such experiments. In addition, larger experiments are carried out over longer periods of time, requiring strategies to handle the sometimes large shift in retention time often observed. We present a complete proof-of-principle software stack that automates most aspects of selected reaction monitoring workflows, a targeted proteomics technology. The software allows experiments to be easily designed and carried out. The steps automated are the generation of assays, generation of mass spectrometry driver files and methods files, and the import and analysis of the data. All data are normalized to a common retention time scale, the data are then scored using a novel score model, and the error is subsequently estimated. We also show that selected reaction monitoring can be used for label-free quantification. All data generated are stored in a relational database, and the growing resource further facilitates the design of new experiments. We apply the technology to a large-scale experiment studying how Streptococcus pyogenes remodels its proteome under stimulation of human plasma.

  10. Workflow management for a cosmology collaboratory

    SciTech Connect

    Loken, Stewart C.; McParland, Charles

    2001-07-20

    The Nearby Supernova Factory Project will provide a unique opportunity to bring together simulation and observation to address crucial problems in particle and nuclear physics. Its goal is to significantly enhance our understanding of the nuclear processes in supernovae and to improve our ability to use both Type Ia and Type II supernovae as reference light sources (standard candles) in precision measurements of cosmological parameters. Over the past several years, astronomers and astrophysicists have been conducting in-depth sky searches with the goal of identifying supernovae in their earliest evolutionary stages and, during the 4 to 8 weeks of their most ''explosive'' activity, measure their changing magnitude and spectra. The search program currently under development at LBNL is an earth-based observation program utilizing observational instruments at Haleakala and Mauna Kea, Hawaii and Mt. Palomar, California. This new program provides a demanding testbed for the integration of computational, data management and collaboratory technologies. A critical element of this effort is the use of emerging workflow management tools to permit collaborating scientists to manage data processing and storage and to integrate advanced supernova simulation into the real-time control of the experiments. This paper describes the workflow management framework for the project, discusses security and resource allocation requirements and reviews emerging tools to support this important aspect of collaborative work.

  11. Engineering design optimization using services and workflows.

    PubMed

    Crick, Tom; Dunning, Peter; Kim, Hyunsun; Padget, Julian

    2009-07-13

    Multi-disciplinary design optimization (MDO) is the process whereby the often conflicting requirements of the different disciplines to the engineering design process attempts to converge upon a description that represents an acceptable compromise in the design space. We present a simple demonstrator of a flexible workflow framework for engineering design optimization using an e-Science tool. This paper provides a concise introduction to MDO, complemented by a summary of the related tools and techniques developed under the umbrella of the UK e-Science programme that we have explored in support of the engineering process. The main contributions of this paper are: (i) a description of the optimization workflow that has been developed in the Taverna workbench, (ii) a demonstrator of a structural optimization process with a range of tool options using common benchmark problems, (iii) some reflections on the experience of software engineering meeting mechanical engineering, and (iv) an indicative discussion on the feasibility of a 'plug-and-play' engineering environment for analysis and design.

  12. Modeling Complex Workflow in Molecular Diagnostics

    PubMed Central

    Gomah, Mohamed E.; Turley, James P.; Lu, Huimin; Jones, Dan

    2010-01-01

    One of the hurdles to achieving personalized medicine has been implementing the laboratory processes for performing and reporting complex molecular tests. The rapidly changing test rosters and complex analysis platforms in molecular diagnostics have meant that many clinical laboratories still use labor-intensive manual processing and testing without the level of automation seen in high-volume chemistry and hematology testing. We provide here a discussion of design requirements and the results of implementation of a suite of lab management tools that incorporate the many elements required for use of molecular diagnostics in personalized medicine, particularly in cancer. These applications provide the functionality required for sample accessioning and tracking, material generation, and testing that are particular to the evolving needs of individualized molecular diagnostics. On implementation, the applications described here resulted in improvements in the turn-around time for reporting of more complex molecular test sets, and significant changes in the workflow. Therefore, careful mapping of workflow can permit design of software applications that simplify even the complex demands of specialized molecular testing. By incorporating design features for order review, software tools can permit a more personalized approach to sample handling and test selection without compromising efficiency. PMID:20007844

  13. Delta: Data Reduction for Integrated Application Workflows.

    SciTech Connect

    Lofstead, Gerald Fredrick; Jean-Baptiste, Gregory; Oldfield, Ron A.

    2015-06-01

    Integrated Application Workflows (IAWs) run multiple simulation workflow components con- currently on an HPC resource connecting these components using compute area resources and compensating for any performance or data processing rate mismatches. These IAWs require high frequency and high volume data transfers between compute nodes and staging area nodes during the lifetime of a large parallel computation. The available network band- width between the two areas may not be enough to efficiently support the data movement. As the processing power available to compute resources increases, the requirements for this data transfer will become more difficult to satisfy and perhaps will not be satisfiable at all since network capabilities are not expanding at a comparable rate. Furthermore, energy consumption in HPC environments is expected to grow by an order of magnitude as exas- cale systems become a reality. The energy cost of moving large amounts of data frequently will contribute to this issue. It is necessary to reduce the volume of data without reducing the quality of data when it is being processed and analyzed. Delta resolves the issue by addressing the lifetime data transfer operations. Delta removes subsequent identical copies of already transmitted data during transfers and restores those copies once the data has reached the destination. Delta is able to identify duplicated information and determine the most space efficient way to represent it. Initial tests show about 50% reduction in data movement while maintaining the same data quality and transmission frequency.

  14. Workflow-Based Software Development Environment

    NASA Technical Reports Server (NTRS)

    Izygon, Michel E.

    2013-01-01

    The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment

  15. The Prosthetic Workflow in the Digital Era

    PubMed Central

    De Franco, Michele; Bosetti, Giovanni

    2016-01-01

    The purpose of this retrospective study was to clinically evaluate the benefits of adopting a full digital workflow for the implementation of fixed prosthetic restorations on natural teeth. To evaluate the effectiveness of these protocols, treatment plans were drawn up for 15 patients requiring rehabilitation of one or more natural teeth. All the dental impressions were taken using a Planmeca PlanScan® (Planmeca OY, Helsinki, Finland) intraoral scanner, which provided digital casts on which the restorations were digitally designed using Exocad® (Exocad GmbH, Germany, 2010) software and fabricated by CAM processing on 5-axis milling machines. A total of 28 single crowns were made from monolithic zirconia, 12 vestibular veneers from lithium disilicate, and 4 three-quarter vestibular veneers with palatal extension. While the restorations were applied, the authors could clinically appreciate the excellent match between the digitally produced prosthetic design and the cemented prostheses, which never required any occlusal or proximal adjustment. Out of all the restorations applied, only one exhibited premature failure and was replaced with no other complications or need for further scanning. From the clinical experience gained using a full digital workflow, the authors can confirm that these work processes enable the fabrication of clinically reliable restorations, with all the benefits that digital methods bring to the dentist, the dental laboratory, and the patient. PMID:27829834

  16. Deriving DICOM surgical extensions from surgical workflows

    NASA Astrophysics Data System (ADS)

    Burgert, O.; Neumuth, T.; Gessat, M.; Jacobs, S.; Lemke, H. U.

    2007-03-01

    The generation, storage, transfer, and representation of image data in radiology are standardized by DICOM. To cover the needs of image guided surgery or computer assisted surgery in general one needs to handle patient information besides image data. A large number of objects must be defined in DICOM to address the needs of surgery. We propose an analysis process based on Surgical Workflows that helps to identify these objects together with use cases and requirements motivating for their specification. As the first result we confirmed the need for the specification of representation and transfer of geometric models. The analysis of Surgical Workflows has shown that geometric models are widely used to represent planned procedure steps, surgical tools, anatomical structures, or prosthesis in the context of surgical planning, image guided surgery, augmented reality, and simulation. By now, the models are stored and transferred in several file formats bare of contextual information. The standardization of data types including contextual information and specifications for handling of geometric models allows a broader usage of such models. This paper explains the specification process leading to Geometry Mesh Service Object Pair classes. This process can be a template for the definition of further DICOM classes.

  17. Scientific Data Management Center Scientific Data Integration

    SciTech Connect

    Critchlow, T J; Liu, L; Pu, C; Gupta, A; Ludaescher, B; Altintas, I; Vouk, M; Bitzer, D; Singh, M; Rosnick, D

    2003-01-31

    The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This has resulted in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and use those sources almost exclusively. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information. We are developing an end-to-end solution using leading-edge automatic wrapper generation, mediated query, and agent technology that will allow scientists to interact with more information sources than currently possible. Furthermore, by taking a workflow-based approach to this problem, we allow them to easily adjust the dataflow between the various sources to address their specific research needs.

  18. Biowep: a workflow enactment portal for bioinformatics applications.

    PubMed

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-03-08

    The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of

  19. Biowep: a workflow enactment portal for bioinformatics applications

    PubMed Central

    Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

    2007-01-01

    Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis

  20. Mixed Methods Approach for Measuring the Impact of Video Telehealth on Outpatient Clinic Triage Nurse Workflow

    PubMed Central

    Cady, Rhonda G.; Finkelstein, Stanley M.

    2015-01-01

    Nurse-delivered telephone triage is a common component of outpatient clinic settings. Adding new communication technology to clinic triage has the potential to not only transform the triage process, but also alter triage workflow. Evaluating the impact of new technology on an existing workflow is paramount to maximizing efficiency of the delivery system. This study investigated triage nurse workflow before and after the implementation of video telehealth using a sequential mixed methods protocol that combined ethnography and time-motion study to provide a robust analysis of the implementation environment. Outpatient clinic triage using video telehealth required significantly more time than telephone triage, indicating a reduction in nurse efficiency. Despite the increased time needed to conduct video telehealth, nurses consistently rated it useful in providing triage. Interpretive analysis of the qualitative and quantitative data suggests the increased depth and breadth of data available during video triage alters the assessment triage nurses provide physicians. This in turn could impact the time physicians spend formulating a diagnosis and treatment plan. While the immediate impact of video telehealth is a reduction in triage nurse efficiency, what is unknown is the impact of video telehealth on physician and overall clinic efficiency. Future studies should address this area. PMID:24080753

  1. Coupling between a multi-physics workflow engine and an optimization framework

    NASA Astrophysics Data System (ADS)

    Di Gallo, L.; Reux, C.; Imbeaux, F.; Artaud, J.-F.; Owsiak, M.; Saoutic, B.; Aiello, G.; Bernardi, P.; Ciraolo, G.; Bucalossi, J.; Duchateau, J.-L.; Fausser, C.; Galassi, D.; Hertout, P.; Jaboulay, J.-C.; Li-Puma, A.; Zani, L.

    2016-03-01

    A generic coupling method between a multi-physics workflow engine and an optimization framework is presented in this paper. The coupling architecture has been developed in order to preserve the integrity of the two frameworks. The objective is to provide the possibility to replace a framework, a workflow or an optimizer by another one without changing the whole coupling procedure or modifying the main content in each framework. The coupling is achieved by using a socket-based communication library for exchanging data between the two frameworks. Among a number of algorithms provided by optimization frameworks, Genetic Algorithms (GAs) have demonstrated their efficiency on single and multiple criteria optimization. Additionally to their robustness, GAs can handle non-valid data which may appear during the optimization. Consequently GAs work on most general cases. A parallelized framework has been developed to reduce the time spent for optimizations and evaluation of large samples. A test has shown a good scaling efficiency of this parallelized framework. This coupling method has been applied to the case of SYCOMORE (SYstem COde for MOdeling tokamak REactor) which is a system code developed in form of a modular workflow for designing magnetic fusion reactors. The coupling of SYCOMORE with the optimization platform URANIE enables design optimization along various figures of merit and constraints.

  2. PyDBS: an automated image processing workflow for deep brain stimulation surgery.

    PubMed

    D'Albis, Tiziano; Haegelen, Claire; Essert, Caroline; Fernández-Vidal, Sara; Lalys, Florent; Jannin, Pierre

    2015-02-01

    Deep brain stimulation (DBS) is a surgical procedure for treating motor-related neurological disorders. DBS clinical efficacy hinges on precise surgical planning and accurate electrode placement, which in turn call upon several image processing and visualization tasks, such as image registration, image segmentation, image fusion, and 3D visualization. These tasks are often performed by a heterogeneous set of software tools, which adopt differing formats and geometrical conventions and require patient-specific parameterization or interactive tuning. To overcome these issues, we introduce in this article PyDBS, a fully integrated and automated image processing workflow for DBS surgery. PyDBS consists of three image processing pipelines and three visualization modules assisting clinicians through the entire DBS surgical workflow, from the preoperative planning of electrode trajectories to the postoperative assessment of electrode placement. The system's robustness, speed, and accuracy were assessed by means of a retrospective validation, based on 92 clinical cases. The complete PyDBS workflow achieved satisfactory results in 92 % of tested cases, with a median processing time of 28 min per patient. The results obtained are compatible with the adoption of PyDBS in clinical practice.

  3. Development of HEATHER for cochlear implant stimulation using a new modeling workflow.

    PubMed

    Tran, Phillip; Sue, Andrian; Wong, Paul; Li, Qing; Carter, Paul

    2015-02-01

    The current conduction pathways resulting from monopolar stimulation of the cochlear implant were studied by developing a human electroanatomical total head reconstruction (namely, HEATHER). HEATHER was created from serially sectioned images of the female Visible Human Project dataset to encompass a total of 12 different tissues, and included computer-aided design geometries of the cochlear implant. Since existing methods were unable to generate the required complexity for HEATHER, a new modeling workflow was proposed. The results of the finite-element analysis agree with the literature, showing that the injected current exits the cochlea via the modiolus (14%), the basal end of the cochlea (22%), and through the cochlear walls (64%). It was also found that, once leaving the cochlea, the current travels to the implant body via the cranial cavity or scalp. The modeling workflow proved to be robust and flexible, allowing for meshes to be generated with substantial user control. Furthermore, the workflow could easily be employed to create realistic anatomical models of the human head for different bioelectric applications, such as deep brain stimulation, electroencephalography, and other biophysical phenomena.

  4. The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

    NASA Astrophysics Data System (ADS)

    Peterson, F. S.; Barbieri, L.; Wyngaard, J.

    2015-12-01

    Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.

  5. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    NASA Astrophysics Data System (ADS)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Ryabinkin, E.; Wenaus, T.

    2016-02-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF), is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF's Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  6. Enabling Efficient Climate Science Workflows in High Performance Computing Environments

    NASA Astrophysics Data System (ADS)

    Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

    2015-12-01

    A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.

  7. Automated Finite State Workflow for Distributed Data Production

    NASA Astrophysics Data System (ADS)

    Hajdu, L.; Didenko, L.; Lauret, J.; Amol, J.; Betts, W.; Jang, H. J.; Noh, S. Y.

    2016-10-01

    In statistically hungry science domains, data deluges can be both a blessing and a curse. They allow the narrowing of statistical errors from known measurements, and open the door to new scientific opportunities as research programs mature. They are also a testament to the efficiency of experimental operations. However, growing data samples may need to be processed with little or no opportunity for huge increases in computing capacity. A standard strategy has thus been to share resources across multiple experiments at a given facility. Another has been to use middleware that “glues” resources across the world so they are able to locally run the experimental software stack (either natively or virtually). We describe a framework STAR has successfully used to reconstruct a ~400 TB dataset consisting of over 100,000 jobs submitted to a remote site in Korea from STAR's Tier 0 facility at the Brookhaven National Laboratory. The framework automates the full workflow, taking raw data files from tape and writing Physics-ready output back to tape without operator or remote site intervention. Through hardening we have demonstrated 97(±2)% efficiency, over a period of 7 months of operation. The high efficiency is attributed to finite state checking with retries to encourage resilience in the system over capricious and fallible infrastructure.

  8. Linked Data and SDI: The case on Web geoprocessing workflows

    NASA Astrophysics Data System (ADS)

    Yue, Peng; Guo, Xia; Zhang, Mingda; Jiang, Liangcun; Zhai, Xi

    2016-04-01

    Linked Data transforms traditional ways of structuring, publishing, discovering, accessing, and integrating data. The advantages of Linked Data, including the common data model, standardized data access mechanism, and link-based data discovery, allow effective sharing and discovery of geospatial resources in Spatial Data Infrastructures (SDI). Web geoprocessing workflows have been widely used in SDI to support distributed geoprocessing. Geospatial data and services are discovered from SDI and chained as geoprocessing workflows. Workflow results can be published as new resources in SDI. The whole process could be improved by the Linked Data approach, so that sensors, observations, data, services, workflows, and provenance can be linked and published into the Web of Data. This paper explores the integration of Linked Data and Web geoprocessing workflows by discovering geospatial resources in the Web of Data to build geoprocessing workflows. It adopts the Linked Data approach to publish geospatial data including in-situ observations and satellite images, as well as geospatial Web services. The workflow results, including data products and processing steps, are also exposed as Linked Data in turn for tracing provenance. The results not only support semantic discovery and integration of heterogeneous geospatial resources, but also provide transparency in data sharing and processing. The approach is implemented as extensions to an existing geoprocessing workflow tool, GeoJModelBuilder, to illustrate its applicability.

  9. Modelling and analysis of workflow for lean supply chains

    NASA Astrophysics Data System (ADS)

    Ma, Jinping; Wang, Kanliang; Xu, Lida

    2011-11-01

    Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.

  10. Genomic variant annotation workflow for clinical applications.

    PubMed

    Thurnherr, Thomas; Singer, Franziska; Stekhoven, Daniel J; Beerenwinkel, Niko

    2016-01-01

    Annotation and interpretation of DNA aberrations identified through next-generation sequencing is becoming an increasingly important task. Even more so in the context of data analysis pipelines for medical applications, where genomic aberrations are associated with phenotypic and clinical features. Here we describe a workflow to identify potential gene targets in aberrated genes or pathways and their corresponding drugs. To this end, we provide the R/Bioconductor package rDGIdb, an R wrapper to query the drug-gene interaction database (DGIdb). DGIdb accumulates drug-gene interaction data from 15 different resources and allows filtering on different levels. The rDGIdb package makes these resources and tools available to R users. Moreover, rDGIdb queries can be automated through incorporation of the rDGIdb package into NGS sequencing pipelines.

  11. Uav Photgrammetric Workflows: a best Practice Guideline

    NASA Astrophysics Data System (ADS)

    Federman, A.; Santana Quintero, M.; Kretz, S.; Gregg, J.; Lengies, M.; Ouimet, C.; Laliberte, J.

    2017-08-01

    The increasing commercialization of unmanned aerial vehicles (UAVs) has opened the possibility of performing low-cost aerial image acquisition for the documentation of cultural heritage sites through UAV photogrammetry. The flying of UAVs in Canada is regulated through Transport Canada and requires a Special Flight Operations Certificate (SFOC) in order to fly. Various image acquisition techniques have been explored in this review, as well as well software used to register the data. A general workflow procedure has been formulated based off of the literature reviewed. A case study example of using UAV photogrammetry at Prince of Wales Fort is discussed, specifically in relation to the data acquisition and processing. Some gaps in the literature reviewed highlight the need for streamlining the SFOC application process, and incorporating UAVs into cultural heritage documentation courses.

  12. Swabs to genomes: a comprehensive workflow

    PubMed Central

    Jospin, Guillaume; Darling, Aaron E.; Coil, David A.

    2015-01-01

    The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it. PMID:26020012

  13. Genomic variant annotation workflow for clinical applications

    PubMed Central

    Thurnherr, Thomas; Singer, Franziska; Stekhoven, Daniel J.; Beerenwinkel, Niko

    2016-01-01

    Annotation and interpretation of DNA aberrations identified through next-generation sequencing is becoming an increasingly important task. Even more so in the context of data analysis pipelines for medical applications, where genomic aberrations are associated with phenotypic and clinical features. Here we describe a workflow to identify potential gene targets in aberrated genes or pathways and their corresponding drugs. To this end, we provide the R/Bioconductor package rDGIdb, an R wrapper to query the drug-gene interaction database (DGIdb). DGIdb accumulates drug-gene interaction data from 15 different resources and allows filtering on different levels. The rDGIdb package makes these resources and tools available to R users. Moreover, rDGIdb queries can be automated through incorporation of the rDGIdb package into NGS sequencing pipelines. PMID:27990260

  14. The BioExtract Server: a web-based bioinformatic workflow platform.

    PubMed

    Lushbough, Carol M; Jennewein, Douglas M; Brendel, Volker P

    2011-07-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet.

  15. SYRMEP Tomo Project: a graphical user interface for customizing CT reconstruction workflows.

    PubMed

    Brun, Francesco; Massimi, Lorenzo; Fratini, Michela; Dreossi, Diego; Billé, Fulvio; Accardo, Agostino; Pugliese, Roberto; Cedola, Alessia

    2017-01-01

    When considering the acquisition of experimental synchrotron radiation (SR) X-ray CT data, the reconstruction workflow cannot be limited to the essential computational steps of flat fielding and filtered back projection (FBP). More refined image processing is often required, usually to compensate artifacts and enhance the quality of the reconstructed images. In principle, it would be desirable to optimize the reconstruction workflow at the facility during the experiment (beamtime). However, several practical factors affect the image reconstruction part of the experiment and users are likely to conclude the beamtime with sub-optimal reconstructed images. Through an example of application, this article presents SYRMEP Tomo Project (STP), an open-source software tool conceived to let users design custom CT reconstruction workflows. STP has been designed for post-beamtime (off-line use) and for a new reconstruction of past archived data at user's home institution where simple computing resources are available. Releases of the software can be downloaded at the Elettra Scientific Computing group GitHub repository https://github.com/ElettraSciComp/STP-Gui.

  16. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  17. Software workflow for the automatic tagging of medieval manuscript images (SWATI)

    NASA Astrophysics Data System (ADS)

    Chandna, Swati; Tonne, Danah; Jejkal, Thomas; Stotzka, Rainer; Krause, Celia; Vanscheidt, Philipp; Busch, Hannah; Prabhune, Ajinkya

    2015-01-01

    Digital methods, tools and algorithms are gaining in importance for the analysis of digitized manuscript collections in the arts and humanities. One example is the BMBF-funded research project "eCodicology" which aims to design, evaluate and optimize algorithms for the automatic identification of macro- and micro-structural layout features of medieval manuscripts. The main goal of this research project is to provide better insights into high-dimensional datasets of medieval manuscripts for humanities scholars. The heterogeneous nature and size of the humanities data and the need to create a database of automatically extracted reproducible features for better statistical and visual analysis are the main challenges in designing a workflow for the arts and humanities. This paper presents a concept of a workflow for the automatic tagging of medieval manuscripts. As a starting point, the workflow uses medieval manuscripts digitized within the scope of the project Virtual Scriptorium St. Matthias". Firstly, these digitized manuscripts are ingested into a data repository. Secondly, specific algorithms are adapted or designed for the identification of macro- and micro-structural layout elements like page size, writing space, number of lines etc. And lastly, a statistical analysis and scientific evaluation of the manuscripts groups are performed. The workflow is designed generically to process large amounts of data automatically with any desired algorithm for feature extraction. As a result, a database of objectified and reproducible features is created which helps to analyze and visualize hidden relationships of around 170,000 pages. The workflow shows the potential of automatic image analysis by enabling the processing of a single page in less than a minute. Furthermore, the accuracy tests of the workflow on a small set of manuscripts with respect to features like page size and text areas show that automatic and manual analysis are comparable. The usage of a computer

  18. Traversing the many paths of workflow research: developing a conceptual framework of workflow terminology through a systematic literature review

    PubMed Central

    Novak, Laurie L; Johnson, Kevin B; Lorenzi, Nancy M

    2010-01-01

    The objective of this review was to describe methods used to study and model workflow. The authors included studies set in a variety of industries using qualitative, quantitative and mixed methods. Of the 6221 matching abstracts, 127 articles were included in the final corpus. The authors collected data from each article on researcher perspective, study type, methods type, specific methods, approaches to evaluating quality of results, definition of workflow and dependent variables. Ethnographic observation and interviews were the most frequently used methods. Long study durations revealed the large time commitment required for descriptive workflow research. The most frequently discussed technique for evaluating quality of study results was triangulation. The definition of the term “workflow” and choice of methods for studying workflow varied widely across research areas and researcher perspectives. The authors developed a conceptual framework of workflow-related terminology for use in future research and present this model for use by other researchers. PMID:20442143

  19. Improving access to space weather data via workflows and web services

    NASA Astrophysics Data System (ADS)

    Sundaravel, Anu Swapna

    The Space Physics Interactive Data Resource (SPIDR) is a web-based interactive tool developed by NOAA's National Geophysical Data Center to provide access to historical space physics datasets. These data sets are widely used by physicists for space weather modeling and predictions. Built on a distributed network of databases and application servers, SPIDR offers services in two ways: via a web page interface and via a web service interface. SPIDR exposes several SOAP-based web services that client applications implement to connect to a number of data sources for data download and processing. At present, the usage of the web services has been difficult, adding unnecessary complexity to client applications and inconvenience to the scientists who want to use these datasets. The purpose of this study focuses on improving SPIDR's web interface to better support data access, integration and display. This is accomplished in two ways: (1) examining the needs of scientists to better understand what web services they require to better access and process these datasets and (2) developing a client application to support SPIDR's SOAP-based services using the Kepler scientific workflow system. To this end, we identified, designed and developed several web services for filtering the existing datasets and created several Kepler workflows to automate routine tasks associated with these datasets. These workflows are a part of the custom NGDC build of the Kepler tool. Scientists are already familiar with Kepler due to its extensive use in this domain. As a result, this approach provides them with tools that are less daunting than raw web services and ultimately more useful and customizable. We evaluated our work by interviewing various scientists who make use of SPIDR and having them use the developed Kepler workflows while recording their feedback and suggestions. Our work has improved SPIDR such that new web services are now available and scientists have access to a desktop

  20. Next-Generation Sequencing Workflow for NSCLC Critical Samples Using a Targeted Sequencing Approach by Ion Torrent PGM™ Platform

    PubMed Central

    Vanni, Irene; Coco, Simona; Truini, Anna; Rusmini, Marta; Dal Bello, Maria Giovanna; Alama, Angela; Banelli, Barbara; Mora, Marco; Rijavec, Erika; Barletta, Giulia; Genova, Carlo; Biello, Federica; Maggioni, Claudia; Grossi, Francesco

    2015-01-01

    Next-generation sequencing (NGS) is a cost-effective technology capable of screening several genes simultaneously; however, its application in a clinical context requires an established workflow to acquire reliable sequencing results. Here, we report an optimized NGS workflow analyzing 22 lung cancer-related genes to sequence critical samples such as DNA from formalin-fixed paraffin-embedded (FFPE) blocks and circulating free DNA (cfDNA). Snap frozen and matched FFPE gDNA from 12 non-small cell lung cancer (NSCLC) patients, whose gDNA fragmentation status was previously evaluated using a multiplex PCR-based quality control, were successfully sequenced with Ion Torrent PGM™. The robust bioinformatic pipeline allowed us to correctly call both Single Nucleotide Variants (SNVs) and indels with a detection limit of 5%, achieving 100% specificity and 96% sensitivity. This workflow was also validated in 13 FFPE NSCLC biopsies. Furthermore, a specific protocol for low input gDNA capable of producing good sequencing data with high coverage, high uniformity, and a low error rate was also optimized. In conclusion, we demonstrate the feasibility of obtaining gDNA from FFPE samples suitable for NGS by performing appropriate quality controls. The optimized workflow, capable of screening low input gDNA, highlights NGS as a potential tool in the detection, disease monitoring, and treatment of NSCLC. PMID:26633390

  1. Workflow Lexicons in Healthcare: Validation of the SWIM Lexicon.

    PubMed

    Meenan, Chris; Erickson, Bradley; Knight, Nancy; Fossett, Jewel; Olsen, Elizabeth; Mohod, Prerna; Chen, Joseph; Langer, Steve G

    2017-01-03

    For clinical departments seeking to successfully navigate the challenges of modern health reform, obtaining access to operational and clinical data to establish and sustain goals for improving quality is essential. More broadly, health delivery organizations are also seeking to understand performance across multiple facilities and often across multiple electronic medical record (EMR) systems. Interpreting operational data across multiple vendor systems can be challenging, as various manufacturers may describe different departmental workflow steps in different ways and sometimes even within a single vendor's installed customer base. In 2012, The Society for Imaging Informatics in Medicine (SIIM) recognized the need for better quality and performance data standards and formed SIIM's Workflow Initiative for Medicine (SWIM), an initiative designed to consistently describe workflow steps in radiology departments as well as defining operational quality metrics. The SWIM lexicon was published as a working model to describe operational workflow steps and quality measures. We measured the prevalence of the SWIM lexicon workflow steps in both academic and community radiology environments using real-world patient observations and correlated that information with automatically captured workflow steps from our clinical information systems. Our goal was to measure frequency of occurrence of workflow steps identified by the SWIM lexicon in a real-world clinical setting, as well as to correlate how accurately departmental information systems captured patient flow through our health facility.

  2. A Tool Supporting Collaborative Data Analytics Workflow Design and Management

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Bao, Q.; Lee, T. J.

    2016-12-01

    Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.

  3. Robust automated knowledge capture.

    SciTech Connect

    Stevens-Adams, Susan Marie; Abbott, Robert G.; Forsythe, James Chris; Trumbo, Michael Christopher Stefan; Haass, Michael Joseph; Hendrickson, Stacey M. Langfitt

    2011-10-01

    This report summarizes research conducted through the Sandia National Laboratories Robust Automated Knowledge Capture Laboratory Directed Research and Development project. The objective of this project was to advance scientific understanding of the influence of individual cognitive attributes on decision making. The project has developed a quantitative model known as RumRunner that has proven effective in predicting the propensity of an individual to shift strategies on the basis of task and experience related parameters. Three separate studies are described which have validated the basic RumRunner model. This work provides a basis for better understanding human decision making in high consequent national security applications, and in particular, the individual characteristics that underlie adaptive thinking.

  4. The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration

    PubMed Central

    2014-01-01

    Background In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

  5. Distributed Workflow Service Composition Based on CTR Technology

    NASA Astrophysics Data System (ADS)

    Feng, Zhilin; Ye, Yanming

    Recently, WS-BPEL has gradually become the basis of a standard for web service description and composition. However, WS-BPEL cannot efficiently describe distributed workflow services for lacking of special expressive power and formal semantics. This paper presents a novel method for modeling distributed workflow service composition with Concurrent TRansaction logic (CTR). The syntactic structure of WS-BPEL and CTR are analyzed, and new rules of mapping WS-BPEL into CTR are given. A case study is put forward to show that the proposed method is appropriate for modeling workflow business services under distributed environments.

  6. Patient safety perspectives: the impact of CPOE on nursing workflow.

    PubMed

    Househ, Mowafa; Ahmad, Anwar; Alshaikh, Anwar; Alsuweed, Fatimah

    2013-01-01

    The purpose of this review is to explore the impact of Computerized Physician Order Entry (CPOE) systems on patient safety from a nursing perspective. The paper discusses the importance of safety culture within nursing, nursing perceptions of CPOE, and the impact of CPOE on nursing workflow. The findings indicate that the implementation of CPOE negatively impacts nursing workflow when CPOE systems are inadequately designed. Future work is necessary to explore the impact of CPOE on nursing workflow and the direct impact on patient safety.

  7. Scientific Data Management Center for Enabling Technologies

    SciTech Connect

    Vouk, Mladen A.

    2013-01-15

    Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Fortunately, the data management problems encountered by most scientific domains are common enough to be addressed through shared technology solutions. Based on community input, we have identified three significant requirements. First, more efficient access to storage systems is needed. In particular, parallel file system and I/O system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. These processes are complicated by the fact that scientific data are structured differently for specific application domains, and are stored in specialized file formats. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over extremely large data sets. Specialized feature discovery and statistical analysis techniques are needed before the data can be understood or visualized. Furthermore, interactive analysis requires techniques for efficiently selecting subsets of the data. Finally, generating the data, collecting and storing the results, keeping track of data provenance, data post-processing, and analysis of results is a tedious, fragmented process. Tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration. The SDM center was established under the SciDAC program to address these issues. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of advanced

  8. ADAPTIVE ROBUST VARIABLE SELECTION

    PubMed Central

    Fan, Jianqing; Fan, Yingying; Barut, Emre

    2014-01-01

    Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized quantile regression with weighted L1-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the L1-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L1-penalized quantile regression estimate from the first step. This two-step procedure is justified theoretically to possess the oracle property and the asymptotic normality. Numerical studies demonstrate the favorable finite-sample performance of the AR-Lasso. PMID:25580039

  9. A Scientific Software Product Line for the Bioinformatics domain.

    PubMed

    Costa, Gabriella Castro B; Braga, Regina; David, José Maria N; Campos, Fernanda

    2015-08-01

    Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation

  10. Text mining meets workflow: linking U-Compare with Taverna

    PubMed Central

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  11. Resource Tracking and Workflow System - part of the CORE system

    SciTech Connect

    2009-10-02

    Resource management and workflow capability applied to engineering design situational awareness, providing the ability to make assignments and track progress through the construction and maintenance life cycle of an engineered structure.

  12. Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics

    PubMed Central

    Astakhov, V.; Bandrowski, A.; Gupta, A.; Kulungowski, A.W.; Grethe, J.S.; Bouwer, J.; Molina, T.; Rowley, V.; Penticoff, S.; Terada, M.; Wong, W.; Hakozaki, H.; Kwon, O.; Martone, M.E.; Ellisman, M.

    2016-01-01

    We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions. PMID:28479932

  13. RseqFlow: workflows for RNA-Seq data analysis

    PubMed Central

    Wang, Ying; Mehta, Gaurang; Mayani, Rajiv; Lu, Jingxi; Souaiaia, Tade; Chen, Yangho; Clark, Andrew; Yoon, Hee Jae; Wan, Lin; Evgrafov, Oleg V.; Knowles, James A.; Deelman, Ewa; Chen, Ting

    2011-01-01

    Summary: We have developed an RNA-Seq analysis workflow for single-ended Illumina reads, termed RseqFlow. This workflow includes a set of analytic functions, such as quality control for sequencing data, signal tracks of mapped reads, calculation of expression levels, identification of differentially expressed genes and coding SNPs calling. This workflow is formalized and managed by the Pegasus Workflow Management System, which maps the analysis modules onto available computational resources, automatically executes the steps in the appropriate order and supervises the whole running process. RseqFlow is available as a Virtual Machine with all the necessary software, which eliminates any complex configuration and installation steps. Availability and implementation: http://genomics.isi.edu/rnaseq Contact: wangying@xmu.edu.cn; knowles@med.usc.edu; deelman@isi.edu; tingchen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21795323

  14. Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics.

    PubMed

    Astakhov, V; Bandrowski, A; Gupta, A; Kulungowski, A W; Grethe, J S; Bouwer, J; Molina, T; Rowley, V; Penticoff, S; Terada, M; Wong, W; Hakozaki, H; Kwon, O; Martone, M E; Ellisman, M

    2012-01-01

    We report on progress of employing the Kepler workflow engine to prototype "end-to-end" application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This "Brain data flow" presents a user with categorized information about sources that have information on various brain regions.

  15. Common Workflow Service: Standards Based Solution for Managing Operational Processes

    NASA Astrophysics Data System (ADS)

    Tinio, A. W.; Hollins, G. A.

    2017-06-01

    The Common Workflow Service is a collaborative and standards-based solution for managing mission operations processes using techniques from the Business Process Management (BPM) discipline. This presentation describes the CWS and its benefits.

  16. Optimization of tomographic reconstruction workflows on geographically distributed resources

    PubMed Central

    Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can

  17. Worklist handling in workflow-enabled radiological application systems

    NASA Astrophysics Data System (ADS)

    Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens

    2000-05-01

    For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.

  18. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in

  19. Taverna: a tool for building and running workflows of services

    PubMed Central

    Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

    2006-01-01

    Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108

  20. A scheduling framework applied to digital publishing workflows

    NASA Astrophysics Data System (ADS)

    Lozano, Wilson; Rivera, Wilson

    2006-02-01

    This paper presents the advances in developing a dynamic scheduling technique suitable for automating digital publishing workflows. Traditionally scheduling in digital publishing has been limited to timing criteria. The proposed scheduling strategy takes into account contingency and priority fluctuations. The new scheduling algorithm, referred to as QB-MUF, gives high priority to jobs with low probability of failing according to artifact recognition and workflow modeling critera. The experimental results show the suitability and efficiency of the scheduling strategy.

  1. New workflows and resources in the VLab cyberinfrastruture

    NASA Astrophysics Data System (ADS)

    da Silveira, P. R.; Nunez Valdez, M.; Hsu, H.; Wentzcovitch, R.

    2011-12-01

    We describe the development and show examples of new resources and workflows in the VLab cyberinfrastructure. Among resources we have high temperature elasticity of several phases we have computed so far. Available results may be reproduced online similarly to thermodynamics properties, which have been available since 2008. New workflows include high temperature elasticity and the determination of the self consistent Hubbard U by linear response theory.

  2. Scientific Misconduct.

    PubMed

    Gross, Charles

    2016-01-01

    Scientific misconduct has been defined as fabrication, falsification, and plagiarism. Scientific misconduct has occurred throughout the history of science. The US government began to take systematic interest in such misconduct in the 1980s. Since then, a number of studies have examined how frequently individual scientists have observed scientific misconduct or were involved in it. Although the studies vary considerably in their methodology and in the nature and size of their samples, in most studies at least 10% of the scientists sampled reported having observed scientific misconduct. In addition to studies of the incidence of scientific misconduct, this review considers the recent increase in paper retractions, the role of social media in scientific ethics, several instructional examples of egregious scientific misconduct, and potential methods to reduce research misconduct.

  3. An Approach to Evaluate Scientist Support in Abstract Workflows and Provenance Traces

    SciTech Connect

    Salayandia, Leonardo; Gates, Ann Q.; Pinheiro da Silva, Paulo

    2012-11-02

    Abstract workflows are useful to bridge the gap between scientists and technologists towards using computer systems to carry out scientific processes. Provenance traces provide evidence required to validate results and support their reuse. Assuming both technologies are based on formal semantics, a knowledge-based system that consistently merges both technologies is useful for scientists that produce data to document their data collecting and transformation processes; it is also useful for scientists that reuse data to assess scientific processes and resulting datasets produced by others. While evaluation of each technology is necessary for a given application, this work discusses their combined evaluation. The claim is that both technologies should complement each other and align consistently to a scientist’s perspective in order to be effective for science. Evaluation criteria are proposed based on lessons learned and exemplified for discussion.

  4. VLAB: Web services, portlets, and workflows for enabling cyber-infrastructure in computational mineral physics

    NASA Astrophysics Data System (ADS)

    Bollig, Evan F.; Jensen, Paul A.; Lyness, Martin D.; Nacar, Mehmet A.; da Silveira, Pedro R. C.; Kigelman, Dan; Erlebacher, Gordon; Pierce, Marlon; Yuen, David A.; da Silva, Cesar R. S.

    2007-08-01

    Virtual organizations are rapidly changing the way scientific communities perform research. Web-based portals, environments designed for collaboration and sharing data, have now become the nexus of distributed high performance computing. Within this paper, we address the infrastructure of the Virtual Laboratory for Earth and Planetary Materials (VLab), an organization dedicated to using quantum calculations to solve problems in mineral physics. VLab provides a front-end portal, accessible from a browser, for scientists to submit large-scale simulations and interactively analyze their results. The cyber-infrastructure of VLab concentrates on scientific workflows, portal development, responsive user-interfaces and automatic generation of web services, all necessary to ensure a maximum degree of flexibility and ease of use for both the expert scientist and the layperson.

  5. Web API for biology with a workflow navigation system.

    PubMed

    Kwon, Yeondae; Shigemoto, Yasumasa; Kuwana, Yoshikazu; Sugawara, Hideaki

    2009-07-01

    DNA Data Bank of Japan (DDBJ) provides Web-based systems for biological analysis, called Web APIs for biology (WABI). So far, we have developed over 20 SOAP services and several workflows that consist of a series of method invocations. In this article, we present newly developed services of WABI, that is, REST-based Web services, additional workflows and a workflow navigation system. Each Web service and workflow can be used as a complete service or a building block for programmers to construct more complex information processing systems. The workflow navigation system aims to help non-programming biologists perform analysis tasks by providing next applicable services on Web browsers according to the output of a previously selected service. With this function, users can apply multiple services consecutively only by following links without any programming or manual copy-and-paste operations on Web browsers. The listed services are determined automatically by the system referring to the dictionaries of service categories, the input/output types of services and HTML tags. WABI and the workflow navigation system are freely accessible at http://www.xml.nig.ac.jp/index.html and http://cyclamen.ddbj.nig.ac.jp/, respectively.

  6. CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.

    2013-12-01

    As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over

  7. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

    PubMed

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A

  8. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

    PubMed Central

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A

  9. Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

    NASA Astrophysics Data System (ADS)

    Jones, A. S.; Horsburgh, J. S.

    2014-12-01

    Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban

  10. Redefining the sonography workflow through the application of a departmental computerized workflow management system.

    PubMed

    Li, Ming-Feng; Tsai, Jerry Ch; Chen, Wei-Juhn; Lin, Huey-Shyan; Pan, Huay-Ben; Yang, Tsung-Lung

    2013-03-01

    The purpose of this study is to demonstrate and evaluate the effective application of a computerized workflow management system (WMS) into sonography workflow in order to reduce patient exam waiting time, number of waiting patients, sonographer stress level, and to improve patient satisfaction. A computerized WMS was built with seamless integration of an automated patient sorting algorithm, a real-time monitoring system, exam schedules fine-tuning, a tele-imaging support system, and a digital signage broadcasting system of patient education programs. The computerized WMS was designed to facilitate problem-solving through continuous customization and flexible adjustment capability. Its effects on operations, staff stress, and patient satisfaction were studied. After implementation of the computerized WMS, there is a significant decrease in patient exam waiting time and sonographer stress level, significant increase in patient satisfaction regarding exam waiting time and the number of examined patients, and marked decrease in the number of waiting patients at different time points in a day. Through multidisciplinary teamwork, the computerized WMS provides a simple and effective approach that can overcome jammed exams associated problems, increase patient satisfaction level, and decrease staff workload stress under limited resources, eventually creating a win-win situation for both the patients and radiology personnel. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  11. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

    PubMed

    Wolstencroft, Katherine; Haines, Robert; Fellows, Donal; Williams, Alan; Withers, David; Owen, Stuart; Soiland-Reyes, Stian; Dunlop, Ian; Nenadic, Aleksandra; Fisher, Paul; Bhagat, Jiten; Belhajjame, Khalid; Bacall, Finn; Hardisty, Alex; Nieva de la Hidalga, Abraham; Balcazar Vargas, Maria P; Sufi, Shoaib; Goble, Carole

    2013-07-01

    The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

  12. The CESM Workflow Re-Engineering Project

    NASA Astrophysics Data System (ADS)

    Strand, G.

    2015-12-01

    The Community Earth System Model (CESM) Workflow Re-Engineering Project is a collaborative project between the CESM Software Engineering Group (CSEG) and the NCAR Computation and Information Systems Lab (CISL) Application Scalability and Performance (ASAP) Group to revamp how CESM saves its output. The CMIP3 and particularly CMIP5 experiences in submitting CESM data to those intercomparison projects revealed that the output format of the CESM is not well-suited for the data requirements common to model intercomparison projects. CESM, for efficiency reasons, creates output files containing all fields for each model time sampling, but MIPs require individual files for each field comprising all model time samples. This transposition of model output can be very time-consuming; depending on the volume of data written by the specific simulation, the time to re-orient the data can be comparable to the time required for the simulation to complete. Previous strategies including using serial tools to perform this transposition, but they are now far too inefficient to deal with the many terabytes of output a single simulation can generate. A new set of Python tools, using data parallelism, have been written to enable this re-orientation, and have achieved markedly improved I/O performance. The perspective of a data manager/data producer in the use of these new tools is presented, and likely future work on their development and use will be shown. These tools are a critical part of the NCAR CESM submission to the upcoming CMIP6, with the intention that a much more timely and efficient submission of the expected petabytes of data will be accomplished in the given time frame.

  13. Ontology-Driven Discovery of Scientific Computational Entities

    ERIC Educational Resources Information Center

    Brazier, Pearl W.

    2010-01-01

    Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…

  14. Ontology-Driven Discovery of Scientific Computational Entities

    ERIC Educational Resources Information Center

    Brazier, Pearl W.

    2010-01-01

    Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…

  15. The myth of standardized workflow in primary care

    PubMed Central

    Beasley, John W; Karsh, Ben-Tzion; Stone, Jamie A; Smith, Paul D; Wetterneck, Tosha B

    2016-01-01

    Objective Primary care efficiency and quality are essential for the nation’s health. The demands on primary care physicians (PCPs) are increasing as healthcare becomes more complex. A more complete understanding of PCP workflow variation is needed to guide future healthcare redesigns. Methods This analysis evaluates workflow variation in terms of the sequence of tasks performed during patient visits. Two patient visits from 10 PCPs from 10 different United States Midwestern primary care clinics were analyzed to determine physician workflow. Tasks and the progressive sequence of those tasks were observed, documented, and coded by task category using a PCP task list. Variations in the sequence and prevalence of tasks at each stage of the primary care visit were assessed considering the physician, the patient, the visit’s progression, and the presence of an electronic health record (EHR) at the clinic. Results PCP workflow during patient visits varies significantly, even for an individual physician, with no single or even common workflow pattern being present. The prevalence of specific tasks shifts significantly as primary care visits progress to their conclusion but, notably, PCPs collect patient information throughout the visit. Discussion PCP workflows were unpredictable during face-to-face patient visits. Workflow emerges as the result of a “dance” between physician and patient as their separate agendas are addressed, a side effect of patient-centered practice. Conclusions Future healthcare redesigns should support a wide variety of task sequences to deliver high-quality primary care. The development of tools such as electronic health records must be based on the realities of primary care visits if they are to successfully support a PCP’s mental and physical work, resulting in effective, safe, and efficient primary care. PMID:26335987

  16. Improving adherence to the Epic Beacon ambulatory workflow.

    PubMed

    Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana

    2017-06-01

    Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.

  17. Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment

    PubMed Central

    Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan

    2016-01-01

    Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058

  18. The myth of standardized workflow in primary care.

    PubMed

    Holman, G Talley; Beasley, John W; Karsh, Ben-Tzion; Stone, Jamie A; Smith, Paul D; Wetterneck, Tosha B

    2016-01-01

    Primary care efficiency and quality are essential for the nation's health. The demands on primary care physicians (PCPs) are increasing as healthcare becomes more complex. A more complete understanding of PCP workflow variation is needed to guide future healthcare redesigns. This analysis evaluates workflow variation in terms of the sequence of tasks performed during patient visits. Two patient visits from 10 PCPs from 10 different United States Midwestern primary care clinics were analyzed to determine physician workflow. Tasks and the progressive sequence of those tasks were observed, documented, and coded by task category using a PCP task list. Variations in the sequence and prevalence of tasks at each stage of the primary care visit were assessed considering the physician, the patient, the visit's progression, and the presence of an electronic health record (EHR) at the clinic. PCP workflow during patient visits varies significantly, even for an individual physician, with no single or even common workflow pattern being present. The prevalence of specific tasks shifts significantly as primary care visits progress to their conclusion but, notably, PCPs collect patient information throughout the visit. PCP workflows were unpredictable during face-to-face patient visits. Workflow emerges as the result of a "dance" between physician and patient as their separate agendas are addressed, a side effect of patient-centered practice. Future healthcare redesigns should support a wide variety of task sequences to deliver high-quality primary care. The development of tools such as electronic health records must be based on the realities of primary care visits if they are to successfully support a PCP's mental and physical work, resulting in effective, safe, and efficient primary care. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

    PubMed Central

    Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta

    2015-01-01

    Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http

  20. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics.

    PubMed

    González-Beltrán, Alejandra; Li, Peter; Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-Wah; Edmunds, Scott C; Sansone, Susanna-Assunta; Rocca-Serra, Philippe

    2015-01-01

    Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. philippe

  1. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

    PubMed Central

    Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

    2016-01-01

    Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships. PMID:27672517

  2. Integration of Earth System Models and Workflow Management under iRODS for the Northeast Regional Earth System Modeling Project

    NASA Astrophysics Data System (ADS)

    Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.

    2012-12-01

    The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.

  3. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

    PubMed Central

    Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  4. Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

    NASA Astrophysics Data System (ADS)

    Nguyen, P. T.; Chapman, D. R.; Halem, M.

    2012-12-01

    New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in

  5. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

    PubMed

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

    2015-05-01

    The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.

  6. Development of the workflow kine systems for support on KAIZEN.

    PubMed

    Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro

    2012-01-01

    In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.

  7. [Integration of the radiotherapy irradiation planning in the digital workflow].

    PubMed

    Röhner, F; Schmucker, M; Henne, K; Momm, F; Bruggmoser, G; Grosu, A-L; Frommhold, H; Heinemann, F E

    2013-02-01

    At the Clinic of Radiotherapy at the University Hospital Freiburg, all relevant workflow is paperless. After implementing the Operating Schedule System (OSS) as a framework, all processes are being implemented into the departmental system MOSAIQ. Designing a digital workflow for radiotherapy irradiation planning is a large challenge, it requires interdisciplinary expertise and therefore the interfaces between the professions also have to be interdisciplinary. For every single step of radiotherapy irradiation planning, distinct responsibilities have to be defined and documented. All aspects of digital storage, backup and long-term availability of data were considered and have already been realized during the OSS project. After an analysis of the complete workflow and the statutory requirements, a detailed project plan was designed. In an interdisciplinary workgroup, problems were discussed and a detailed flowchart was developed. The new functionalities were implemented in a testing environment by the Clinical and Administrative IT Department (CAI). After extensive tests they were integrated into the new modular department system. The Clinic of Radiotherapy succeeded in realizing a completely digital workflow for radiotherapy irradiation planning. During the testing phase, our digital workflow was examined and afterwards was approved by the responsible authority.

  8. Relationship between e-prescriptions and community pharmacy workflow.

    PubMed

    Odukoya, Olufunmilola K; Chui, Michelle A

    2012-01-01

    To understand how community pharmacists use electronic prescribing (e-prescribing) technology and to describe the workflow challenges pharmacy personnel encounter as a result of using e-prescribing technology. Cross-sectional qualitative study. Seven community pharmacies in Wisconsin from December 2010 to March 2011. 16 pharmacists and 14 pharmacy technicians (in three chain and four independent pharmacies). Think-aloud protocols and pharmacy group interviews. Pharmacy staff descriptions of their use of e-prescribing technology and challenges encountered in their daily workflow related to this technology. Two contributing factors were perceived to influence e-prescribing workflow: issues stemming from prescribing or transmitting software and issues from within the pharmacy. Pharmacies experienced both delayed and inaccurate e-prescriptions from physician offices. An overwhelming number of e-prescriptions with inaccurate or unclear information resulted in serious time delays for patients as pharmacists contacted physicians to clarify wrong information. In addition, lack of formal training and the disconnect between pharmacy procedures for verifying prescription accuracy and presentation of e-prescription information on the computer screen influenced the speed of processing an e-prescription. E-prescriptions processing can hinder pharmacy workflow. As the number of e-prescriptions transmitted to pharmacies increases because of legislative mandates, it is essential that the technology supporting e-prescriptions (both on the prescriber and pharmacy operating systems) be redesigned to facilitate pharmacy workflow processes and to prevent unintended increase in medication errors, user frustration, and stress.

  9. CamBAfx: Workflow Design, Implementation and Application for Neuroimaging

    PubMed Central

    Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

    2009-01-01

    CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470

  10. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  11. Optimizing high performance computing workflow for protein functional annotation

    PubMed Central

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-01-01

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  12. Automation of Global Adjoint Tomography Based on ASDF and Workflow Management Tools

    NASA Astrophysics Data System (ADS)

    Lei, W.; Ruan, Y.; Bozdag, E.; Smith, J. A.; Modrak, R. T.; Krischer, L.; Chen, Y.; Lefebvre, M. P.; Tromp, J.

    2016-12-01

    Global adjoint tomography is computationally expensive, requiring thousands of wavefield simulations and massive data processing. Though a collaboration with the Oak Ridge National Laboratory computing group and an allocation on the `Titan' GPU-accelerated supercomputer, we have begun to assimilate waveform data from more than 4,000 earthquakes, from 1995 to 2015, in our inversions. However, since conventional file formats and signal processing tools were not designed for parallel processing of massive data volumes, use of such tools in high-resolution global inversions leads to major bottlenecks. To overcome such problems and allow for continued scientific progress, we designed the Adaptive Seismic Data Format (ASDF) and developed a set of processing tools based on ASDF, covering from signal processing (pytomo3d), time window selection (pyflex) to adjoint source (pyadjoint). These new tools greatly enhance the reproducibility and accountability of our research while taking full advantage of parallel computing, showing superior scaling on modern computational platforms. The entire inversion workflow, intrinsically complex and sensitive to human errors, is carefully handled and automated by modern workflow management tools, preventing data contamination and saving a huge amount of time. Our starting model GLAD-M15 (Bozdag et al., 2016), an elastic model with transversely isotropic upper mantle, is based on 253 earthquakes and 15 nonlinear conjugate gradient iterations. We have now completed source inversions for more than 1,000 earthquakes and have started structural inversions using a quasi-Newton optimization algorithm. We will discuss the challenges of large-scale workflows on HPC systems, the solutions offered by our new adjoint tomography tools, and the initial tomographic results obtained using the new expanded dataset.

  13. Cost-Minimizing Scheduling of Workflows on a Cloud of Memory Managed Multicore Machines

    NASA Astrophysics Data System (ADS)

    Grounds, Nicolas G.; Antonio, John K.; Muehring, Jeff

    Workflows are modeled as hierarchically structured directed acyclic graphs in which vertices represent computational tasks, referred to as requests, and edges represent precedent constraints among requests. Associated with each workflow is a deadline that defines the time by which all computations of a workflow should be complete. Workflows are submitted by numerous clients to a scheduler that assigns workflow requests to a cloud of memory managed multicore machines for execution. A cost function is assumed to be associated with each workflow, which maps values of relative workflow tardiness to corresponding cost function values. A novel cost-minimizing scheduling framework is introduced to schedule requests of workflows so as to minimize the sum of cost function values for all workflows. The utility of the proposed scheduler is compared to another previously known scheduling policy.

  14. Scientific Utopia: An agenda for improving scientific communication (Invited)

    NASA Astrophysics Data System (ADS)

    Nosek, B.

    2013-12-01

    The scientist's primary incentive is publication. In the present culture, open practices do not increase chances of publication, and they often require additional work. Practicing the abstract scientific values of openness and reproducibility thus requires behaviors in addition to those relevant for the primary, concrete rewards. When in conflict, concrete rewards are likely to dominate over abstract ones. As a consequence, the reward structure for scientists does not encourage openness and reproducibility. This can be changed by nudging incentives to align scientific practices with scientific values. Science will benefit by creating and connecting technologies that nudge incentives while supporting and improving the scientific workflow. For example, it should be as easy to search the research literature for my topic as it is to search the Internet to find hilarious videos of cats falling off of furniture. I will introduce the Center for Open Science (http://centerforopenscience.org/) and efforts to improve openness and reproducibility such as http://openscienceframework.org/. There will be no cats.

  15. The workflow of single-cell expression profiling using quantitative real-time PCR

    PubMed Central

    Ståhlberg, Anders; Kubista, Mikael

    2014-01-01

    Biological material is heterogeneous and when exposed to stimuli the various cells present respond differently. Much of the complexity can be eliminated by disintegrating the sample, studying the cells one by one. Single-cell profiling reveals responses that go unnoticed when classical samples are studied. New cell types and cell subtypes may be found and relevant pathways and expression networks can be identified. The most powerful technique for single-cell expression profiling is currently quantitative reverse transcription real-time PCR (RT-qPCR). A robust RT-qPCR workflow for highly sensitive and specific measurements in high-throughput and a reasonable degree of multiplexing has been developed for targeting mRNAs, but also microRNAs, non-coding RNAs and most recently also proteins. We review the current state of the art of single-cell expression profiling and present also the improvements and developments expected in the next 5 years. PMID:24649819

  16. Preparation of Proteins and Peptides for Mass Spectrometry Analysis in a Bottom-Up Proteomics Workflow

    PubMed Central

    Gundry, Rebekah L.; White, Melanie Y.; Murray, Christopher I.; Kane, Lesley A.; Fu, Qin; Stanley, Brian A.; Van Eyk, Jennifer E.

    2010-01-01

    This unit outlines the steps required to prepare a sample for MS analysis following protein separation or enrichment by gel electrophoresis, liquid chromatography, and affinity capture within the context of a bottom-up proteomics workflow in which the protein is first broken up into peptides, either by chemical or enzymatic digestion, prior to MS analysis. Also included are protocols for enrichment at the peptide level, including phosphopeptide enrichment and reversed-phase chromatography for sample purification immediately prior to MS analysis. Finally, there is a discussion regarding the types of MS technologies commonly used to analyze proteomics samples, as well as important parameters that should be considered when analyzing the MS data to ensure stringent and robust protein identifications and characterization. PMID:19816929

  17. Building an efficient curation workflow for the Arabidopsis literature corpus

    PubMed Central

    Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298

  18. Nexus: a modular workflow management system for quantum simulation codes

    SciTech Connect

    Krogel, Jaron T.

    2015-08-24

    The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.

  19. Nexus: a modular workflow management system for quantum simulation codes

    DOE PAGES

    Krogel, Jaron T.

    2015-08-24

    The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less

  20. Building an efficient curation workflow for the Arabidopsis literature corpus.

    PubMed

    Li, Donghui; Berardini, Tanya Z; Muller, Robert J; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org

  1. Analog to digital workflow improvement: a quantitative study.

    PubMed

    Wideman, Catherine; Gallet, Jacqueline

    2006-01-01

    This study tracked a radiology department's conversion from utilization of a Kodak Amber analog system to a Kodak DirectView DR 5100 digital system. Through the use of ProModel Optimization Suite, a workflow simulation software package, significant quantitative information was derived from workflow process data measured before and after the change to a digital system. Once the digital room was fully operational and the radiology staff comfortable with the new system, average patient examination time was reduced from 9.24 to 5.28 min, indicating that a higher patient throughput could be achieved. Compared to the analog system, chest examination time for modality specific activities was reduced by 43%. The percentage of repeat examinations experienced with the digital system also decreased to 8% vs. the level of 9.5% experienced with the analog system. The study indicated that it is possible to quantitatively study clinical workflow and productivity by using commercially available software.

  2. Assessment of the Nurse Medication Administration Workflow Process

    PubMed Central

    Snyder, Rita; Vidal, José M.; Sharif, Omor; Cai, Bo; Parsons, Bridgette; Bennett, Kevin

    2016-01-01

    This paper presents findings of an observational study of the Registered Nurse (RN) Medication Administration Process (MAP) conducted on two comparable medical units in a large urban tertiary care medical center in Columbia, South Carolina. A total of 305 individual MAP observations were recorded over a 6-week period with an average of 5 MAP observations per RN participant for both clinical units. A key MAP variation was identified in terms of unbundled versus bundled MAP performance. In the unbundled workflow, an RN engages in the MAP by performing only MAP tasks during a care episode. In the bundled workflow, an RN completes medication administration along with other patient care responsibilities during the care episode. Using a discrete-event simulation model, this paper addresses the difference between unbundled and bundled workflow and their effects on simulated redesign interventions.

  3. OWL: A Condor Based Workflow Management System for JWST

    NASA Astrophysics Data System (ADS)

    Pierfederici, F.; Swam, M.; Greene, G.; Kyprianou, M.; Gaffney, N.

    2012-09-01

    The Open Workflow Layer (OWL) is an open source Workflow Management System (WMS) developed at the Space Telescope Science Institute. OWL is being designed for the James Webb Space Telescope (JWST) science data processing using the Hubble Space Telescope (HST) as a test bed. It is however very general and could be applied to many other missions and data processing applications. OWL is a thin Python layer that provides advanced workflow management, GUIs and a data-centric view on top of Condor, a widely used open source batch scheduling system. As such, OWL can transparently take advantage of the many features offered by Condor without having to re-implement them from scratch.

  4. NOW: A Workflow Language for Orchestration in Nomadic Networks

    NASA Astrophysics Data System (ADS)

    Philips, Eline; van der Straeten, Ragnhild; Jonckers, Viviane

    Existing workflow languages for nomadic or mobile ad hoc networks do not offer adequate support for dealing with the volatile connections inherent to these environments. Services residing on mobile devices are exposed to (temporary) network failures, which should be considered the rule rather than the exception. This paper proposes a nomadic workflow language built on top of an ambient-oriented programming language which supports dynamic service discovery and communication primitives resilient to network failures. Our proposed language provides high level workflow abstractions for control flow and supports rich network and service failure detection and handling through compensating actions. Moreover, we introduce a powerful variable binding mechanism which enables dynamic data flow between services in a nomadic environment. By adding this extra layer of abstraction on top of an ambient-oriented programming language, the application programmer is offered a flexible way to develop applications for nomadic networks.

  5. Nexus: A modular workflow management system for quantum simulation codes

    NASA Astrophysics Data System (ADS)

    Krogel, Jaron T.

    2016-01-01

    The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.

  6. Evaluating plant immunity using mass spectrometry-based metabolomics workflows

    PubMed Central

    Heuberger, Adam L.; Robison, Faith M.; Lyons, Sarah Marie A.; Broeckling, Corey D.; Prenni, Jessica E.

    2014-01-01

    Metabolic processes in plants are key components of physiological and biochemical disease resistance. Metabolomics, the analysis of a broad range of small molecule compounds in a biological system, has been used to provide a systems-wide overview of plant metabolism associated with defense responses. Plant immunity has been examined using multiple metabolomics workflows that vary in methods of detection, annotation, and interpretation, and the choice of workflow can significantly impact the conclusions inferred from a metabolomics investigation. The broad range of metabolites involved in plant defense often requires multiple chemical detection platforms and implementation of a non-targeted approach. A review of the current literature reveals a wide range of workflows that are currently used in plant metabolomics, and new methods for analyzing and reporting mass spectrometry (MS) data can improve the ability to translate investigative findings among different plant-pathogen systems. PMID:25009545

  7. Implementation of a health information exchange into community pharmacy workflow.

    PubMed

    Hohmeier, Kenneth C; Spivey, Christina A; Boldin, Samantha; Moore, Tara B; Chisholm-Burns, Marie

    To explore the feasibility and report preliminary outcomes of the integration of a health information exchange (HIE) into community pharmacy workflow clinical service delivery. Independent pharmacy in eastern Tennessee. The pharmacy offers medication reconciliation services via HIE access, as well as other clinical pharmacy services. The average number of prescriptions filled weekly is 1900, and staffing included 3.5 full-time-equivalent (FTE) pharmacists, and 7 FTE technicians. HIE integration within the workflow of the pharmacy was used to enhance existing patient care services, such as medication distribution, drug use review, medication therapy management, and immunizations, as well as to implement a novel transitional care service. A mixed-methods design was used to explore HIE workflow. Data collection included a pharmacist and pharmacy technician perceptions survey, mapping steps involved in HIE use in workflow via a think-aloud protocol, and quantitatively reporting the number and type of discordant medications found on medication reconciliation. In total, 25 patients qualified for the medication reconciliation intervention and data collection. All 25 patients (100%) had at least 1 discordant medication. HIE access was used for 60% of patients. Community pharmacists were confident in their abilities to perform medication reconciliation and were able to perform the medication reconciliation with the use of the HIE within their workflow, albeit with some reported barriers. The average time spent per patient for HIE-facilitated transitional care was 21 minutes. Integration and utilization of an HIE within the workflow for the purposes of patient care service delivery in the community pharmacy is feasible, but not without limitations. Such HIE utilization and extended access to the patient's clinical picture may represent a scalable method to enhance currently delivered pharmacist services. Copyright © 2017 American Pharmacists Association®. Published by

  8. A workflow for the 3D visualization of meteorological data

    NASA Astrophysics Data System (ADS)

    Helbig, Carolin; Rink, Karsten

    2014-05-01

    In the future, climate change will strongly influence our environment and living conditions. To predict possible changes, climate models that include basic and process conditions have been developed and big data sets are produced as a result of simulations. The combination of various variables of climate models with spatial data from different sources helps to identify correlations and to study key processes. For our case study we use results of the weather research and forecasting (WRF) model of two regions at different scales that include various landscapes in Northern Central Europe and Baden-Württemberg. We visualize these simulation results in combination with observation data and geographic data, such as river networks, to evaluate processes and analyze if the model represents the atmospheric system sufficiently. For this purpose, a continuous workflow that leads from the integration of heterogeneous raw data to visualization using open source software (e.g. OpenGeoSys Data Explorer, ParaView) is developed. These visualizations can be displayed on a desktop computer or in an interactive virtual reality environment. We established a concept that includes recommended 3D representations and a color scheme for the variables of the data based on existing guidelines and established traditions in the specific domain. To examine changes over time in observation and simulation data, we added the temporal dimension to the visualization. In a first step of the analysis, the visualizations are used to get an overview of the data and detect areas of interest such as regions of convection or wind turbulences. Then, subsets of data sets are extracted and the included variables can be examined in detail. An evaluation by experts from the domains of visualization and atmospheric sciences establish if they are self-explanatory and clearly arranged. These easy-to-understand visualizations of complex data sets are the basis for scientific communication. In addition, they have

  9. Flexible End2End Workflow Automation of Hit-Discovery Research.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Thurow, Kerstin

    2014-08-01

    The article considers a new approach of more complex laboratory automation at the workflow layer. The authors purpose the automation of end2end workflows. The combination of all relevant subprocesses-whether automated or manually performed, independently, and in which organizational unit-results in end2end processes that include all result dependencies. The end2end approach focuses on not only the classical experiments in synthesis or screening, but also on auxiliary processes such as the production and storage of chemicals, cell culturing, and maintenance as well as preparatory activities and analyses of experiments. Furthermore, the connection of control flow and data flow in the same process model leads to reducing of effort of the data transfer between the involved systems, including the necessary data transformations. This end2end laboratory automation can be realized effectively with the modern methods of business process management (BPM). This approach is based on a new standardization of the process-modeling notation Business Process Model and Notation 2.0. In drug discovery, several scientific disciplines act together with manifold modern methods, technologies, and a wide range of automated instruments for the discovery and design of target-based drugs. The article discusses the novel BPM-based automation concept with an implemented example of a high-throughput screening of previously synthesized compound libraries. © 2014 Society for Laboratory Automation and Screening.

  10. A data management and publication workflow for a large-scale, heterogeneous sensor network.

    PubMed

    Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan

    2015-06-01

    It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.

  11. Automation of Bioinformatics Workflows using CloVR, a Cloud Virtual Resource

    PubMed Central

    Vangala, Mahesh

    2013-01-01

    Exponential growth of biological data, mainly due to revolutionary developments in NGS technologies in past couple of years, created a multitude of challenges in downstream data analysis using bioinformatics approaches. To handle such tsunami of data, bioinformatics analysis must be carried out in an automated and parallel fashion. A successful analysis often requires more than a few computational steps and bootstrapping these individual steps (scripts) into components and the components into pipelines certainly makes bioinformatics a reproducible and manageable segment of scientific research. CloVR (http://clovr.org) is one such flexible framework that facilitates the abstraction of bioinformatics workflows into executable pipelines. CloVR comes packaged with various built-in bioinformatics pipelines that can make use of multicore processing power when run on servers and/or cloud. CloVR is amenable to build custom pipelines based on individual laboratory requirements. CloVR is available as a single executable virtual image file that comes bundled with pre-installed and pre-configured bioinformatics tools and packages and thus circumvents the cumbersome installation difficulties. CloVR is highly portable and can be run on traditional desktop/laptop computers, central servers and cloud compute farms. In conclusion, CloVR provides built-in automated analysis pipelines for microbial genomics with a scope to develop and integrate custom-workflows that make use of parallel processing power when run on compute clusters, there by addressing the bioinformatics challenges with NGS data.

  12. A comprehensive workflow of mass spectrometry-based untargeted metabolomics in cancer metabolic biomarker discovery using human plasma and urine.

    PubMed

    Zou, Wei; She, Jianwen; Tolstikov, Vladimir V

    2013-09-11

    Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC-LC), reversed-phase liquid chromatography (RP-LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow.

  13. A Comprehensive Workflow of Mass Spectrometry-Based Untargeted Metabolomics in Cancer Metabolic Biomarker Discovery Using Human Plasma and Urine

    PubMed Central

    Zou, Wei; She, Jianwen; Tolstikov, Vladimir V.

    2013-01-01

    Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC–LC), reversed-phase liquid chromatography (RP–LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow. PMID:24958150

  14. Workflow for Genome-Wide Determination of Pre-mRNA Splicing Efficiency from Yeast RNA-seq Data

    PubMed Central

    Folk, Petr

    2016-01-01

    Pre-mRNA splicing represents an important regulatory layer of eukaryotic gene expression. In the simple budding yeast Saccharomyces cerevisiae, about one-third of all mRNA molecules undergo splicing, and splicing efficiency is tightly regulated, for example, during meiotic differentiation. S. cerevisiae features a streamlined, evolutionarily highly conserved splicing machinery and serves as a favourite model for studies of various aspects of splicing. RNA-seq represents a robust, versatile, and affordable technique for transcriptome interrogation, which can also be used to study splicing efficiency. However, convenient bioinformatics tools for the analysis of splicing efficiency from yeast RNA-seq data are lacking. We present a complete workflow for the calculation of genome-wide splicing efficiency in S. cerevisiae using strand-specific RNA-seq data. Our pipeline takes sequencing reads in the FASTQ format and provides splicing efficiency values for the 5′ and 3′ splice junctions of each intron. The pipeline is based on up-to-date open-source software tools and requires very limited input from the user. We provide all relevant scripts in a ready-to-use form. We demonstrate the functionality of the workflow using RNA-seq datasets from three spliceosome mutants. The workflow should prove useful for studies of yeast splicing mutants or of regulated splicing, for example, under specific growth conditions. PMID:28050562

  15. Workflow for Genome-Wide Determination of Pre-mRNA Splicing Efficiency from Yeast RNA-seq Data.

    PubMed

    Převorovský, Martin; Hálová, Martina; Abrhámová, Kateřina; Libus, Jiří; Folk, Petr

    2016-01-01

    Pre-mRNA splicing represents an important regulatory layer of eukaryotic gene expression. In the simple budding yeast Saccharomyces cerevisiae, about one-third of all mRNA molecules undergo splicing, and splicing efficiency is tightly regulated, for example, during meiotic differentiation. S. cerevisiae features a streamlined, evolutionarily highly conserved splicing machinery and serves as a favourite model for studies of various aspects of splicing. RNA-seq represents a robust, versatile, and affordable technique for transcriptome interrogation, which can also be used to study splicing efficiency. However, convenient bioinformatics tools for the analysis of splicing efficiency from yeast RNA-seq data are lacking. We present a complete workflow for the calculation of genome-wide splicing efficiency in S. cerevisiae using strand-specific RNA-seq data. Our pipeline takes sequencing reads in the FASTQ format and provides splicing efficiency values for the 5' and 3' splice junctions of each intron. The pipeline is based on up-to-date open-source software tools and requires very limited input from the user. We provide all relevant scripts in a ready-to-use form. We demonstrate the functionality of the workflow using RNA-seq datasets from three spliceosome mutants. The workflow should prove useful for studies of yeast splicing mutants or of regulated splicing, for example, under specific growth conditions.

  16. Data processing workflows from low-cost digital survey to various applications: three case studies of Chinese historic architecture

    NASA Astrophysics Data System (ADS)

    Sun, Z.; Cao, Y. K.

    2015-08-01

    The paper focuses on the versatility of data processing workflows ranging from BIM-based survey to structural analysis and reverse modeling. In China nowadays, a large number of historic architecture are in need of restoration, reinforcement and renovation. But the architects are not prepared for the conversion from the booming AEC industry to architectural preservation. As surveyors working with architects in such projects, we have to develop efficient low-cost digital survey workflow robust to various types of architecture, and to process the captured data for architects. Although laser scanning yields high accuracy in architectural heritage documentation and the workflow is quite straightforward, the cost and portability hinder it from being used in projects where budget and efficiency are of prime concern. We integrate Structure from Motion techniques with UAV and total station in data acquisition. The captured data is processed for various purposes illustrated with three case studies: the first one is as-built BIM for a historic building based on registered point clouds according to Ground Control Points; The second one concerns structural analysis for a damaged bridge using Finite Element Analysis software; The last one relates to parametric automated feature extraction from captured point clouds for reverse modeling and fabrication.

  17. Flexible Early Warning Systems with Workflows and Decision Tables

    NASA Astrophysics Data System (ADS)

    Riedel, F.; Chaves, F.; Zeiner, H.

    2012-04-01

    An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows

  18. Optimization of tomographic reconstruction workflows on geographically distributed resources

    SciTech Connect

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum

  19. Contextual cloud-based service oriented architecture for clinical workflow.

    PubMed

    Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

    2015-01-01

    Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.

  20. Linking Geobiology Fieldwork and Data Curation Through Workflow Documentation

    NASA Astrophysics Data System (ADS)

    Thomer, A.; Baker, K. S.; Jett, J. G.; Gordon, S.; Palmer, C. L.

    2014-12-01

    Describing the specific processes and artifacts that lead to the creation of data products provides a detailed picture of data provenance in the form of a high-level workflow. The resulting diagram identifies:1. "points of intervention" at which curation processes can be moved upstream, and 2. data products that may be important for sharing and preservation. The Site-Based Data Curation project, an Institute of Museum and Library Services-funded project hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, previously inferred a geobiologist's planning, field and laboratory workflows through close study of the data products produced during a single field trip to Yellowstone National Park (Wickett et al, 2013). We have since built on this work by documenting post hoc curation processes, and integrating them with the existing workflow. By holistically considering both data collection and curation, we are able to identify concrete steps that scientists can take to begin curating data in the field. This field-to-repository workflow represents a first step toward a more comprehensive and nuanced model of the research data lifecycle. Using our initial three-phase workflow, we identify key data products to prioritize for curation, and the points at which data curation best practices integrate with research processes with minimal interruption. We then document the processes that make key data products sharable and ready for preservation. We append the resulting curatorial phases to the field data collection workflow: Data Staging, Data Standardizing and Data Packaging. These refinements demonstrate:1) the interdependence of research and curatorial phases;2) the links between specific research products, research phases and curatorial processes; 3) the interdependence of laboratory-specific standards and community-wide best practices. We propose a poster that shows the six-phase workflow described above. We plan to discuss

  1. Workflow modeling in the graphic arts and printing industry

    NASA Astrophysics Data System (ADS)

    Tuijn, Chris

    2003-12-01

    The last few years, a lot of effort has been spent on the standardization of the workflow in the graphic arts and printing industry. The main reasons for this standardization are two-fold: first of all, the need to represent all aspects of products, processes and resources in a uniform, digital framework and, secondly, the need to have different systems communicate with each other without having to implement dedicated drivers or protocols. Since many years, a number of organizations in the IT sector have been quite busy developing models and languages on the topic of workflow modeling. In addition to the more formal methods (such as, e.g., extended finite state machines, Petri Nets, Markov Chains etc.) introduced a number of decades ago, more pragmatic methods have been proposed quite recently. We hereby think in particular of the activities of the Workflow Management Coalition that resulted in an XML based Process Definition Language. Although one might be tempted to use the already established standards in the graphic environment, one should be well aware of the complexity and uniqueness of the graphic arts workflow. In this paper, we will show that it is quite hard though not impossible to model the graphic arts workflow using the already established workflow systems. After a brief summary of the graphic arts workflow requirements, we will show why the traditional models are less suitable to use. It will turn out that one of the main reasons for the incompatibility is that the graphic arts workflow is primarily resource driven; this means that the activation of processes depends on the status of different incoming resources. The fact that processes can start running with a partial availability of the input resources is a further complication that asks for additional knowledge on process level. In the second part of this paper, we will discuss in more detail the different software components that are available in any graphic enterprise. In the last part, we will

  2. Refactoring Problem of Acyclic Extended Free-Choice Workflow Nets to Acyclic Well-Structured Workflow Nets

    NASA Astrophysics Data System (ADS)

    Yamaguchi, Shingo

    A workflow net (WF-net for short) is a Petri net which represents a workflow. There are two important subclasses of WF-nets: extended free-choice (EFC for short) and well-structured (WS for short). It is known that most actual workflows can be modeled as EFC WF-nets; Acyclic WS is a subclass of acyclic EFC but has more analysis methods. An acyclic EFC WF-net may be transformed to an acyclic WS WF-net without changing the external behavior of the net. We name such a transformation Acyclic EFC WF-net refactoring. We give a formal definition of acyclic EFC WF-net refactoring problem. We also give a necessary condition and a sufficient condition for solving the problem. Those conditions can be checked in polynomial time. These result in the enhancement of the analysis power of acyclic EFC WF-nets.

  3. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.

    2008-12-01

    NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a

  4. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.

    2009-04-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the Viz

  5. Synthesis of robust controllers

    NASA Technical Reports Server (NTRS)

    Marrison, Chris

    1993-01-01

    At the 1990 American Controls Conference a benchmark problem was issued as a challenge for designing robust compensators. Many compensators were presented in response to the problem. In previous work Stochastic Robustness Analysis (SRA) was used to compare these compensators. In this work SRA metrics are used as guides to synthesize robust compensators, using the benchmark problem as an example.

  6. Understanding latent structures of clinical information logistics: A bottom-up approach for model building and validating the workflow composite score.

    PubMed

    Esdar, Moritz; Hübner, Ursula; Liebe, Jan-David; Hüsers, Jens; Thye, Johannes

    2017-01-01

    Clinical information logistics is a construct that aims to describe and explain various phenomena of information provision to drive clinical processes. It can be measured by the workflow composite score, an aggregated indicator of the degree of IT support in clinical processes. This study primarily aimed to investigate the yet unknown empirical patterns constituting this construct. The second goal was to derive a data-driven weighting scheme for the constituents of the workflow composite score and to contrast this scheme with a literature based, top-down procedure. This approach should finally test the validity and robustness of the workflow composite score. Based on secondary data from 183 German hospitals, a tiered factor analytic approach (confirmatory and subsequent exploratory factor analysis) was pursued. A weighting scheme, which was based on factor loadings obtained in the analyses, was put into practice. We were able to identify five statistically significant factors of clinical information logistics that accounted for 63% of the overall variance. These factors were "flow of data and information", "mobility", "clinical decision support and patient safety", "electronic patient record" and "integration and distribution". The system of weights derived from the factor loadings resulted in values for the workflow composite score that differed only slightly from the score values that had been previously published based on a top-down approach. Our findings give insight into the internal composition of clinical information logistics both in terms of factors and weights. They also allowed us to propose a coherent model of clinical information logistics from a technical perspective that joins empirical findings with theoretical knowledge. Despite the new scheme of weights applied to the calculation of the workflow composite score, the score behaved robustly, which is yet another hint of its validity and therefore its usefulness. Copyright © 2016 Elsevier Ireland

  7. PREDON Scientific Data Preservation 2014

    NASA Astrophysics Data System (ADS)

    Diaconu, C.; Kraml, S.; Surace, C.; Chateigner, D.; Libourel, T.; Laurent, A.; Lin, Y.; Schaming, M.; Benbernou, S.; Lebbah, M.; Boucon, D.; Cérin, C.; Azzag, H.; Mouron, P.; Nief, J.-Y.; Coutin, S.; Beckmann, V.

    Scientific data collected with modern sensors or dedicated detectors exceed very often the perimeter of the initial scientific design. These data are obtained more and more frequently with large material and human efforts. A large class of scientific experiments are in fact unique because of their large scale, with very small chances to be repeated and to superseded by new experiments in the same domain: for instance high energy physics and astrophysics experiments involve multi-annual developments and a simple duplication of efforts in order to reproduce old data is simply not affordable. Other scientific experiments are in fact unique by nature: earth science, medical sciences etc. since the collected data is "time-stamped" and thereby non-reproducible by new experiments or observations. In addition, scientific data collection increased dramatically in the recent years, participating to the so-called "data deluge" and inviting for common reflection in the context of "big data" investigations. The new knowledge obtained using these data should be preserved long term such that the access and the re-use are made possible and lead to an enhancement of the initial investment. Data observatories, based on open access policies and coupled with multi-disciplinary techniques for indexing and mining may lead to truly new paradigms in science. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms. To address this challenge, the PREDON project has been initiated in France in 2012 within the MASTODONS program: a Big Data scientific challenge, initiated and supported by the Interdisciplinary Mission of the National Centre for Scientific Research (CNRS). PREDON is a study group formed by

  8. Standardized workflows for increasing efficiency and productivity in discovery stage bioanalysis.

    PubMed

    Bateman, Kevin P; Cohen, Lucinda; Emary, Bart; Pucci, Vincenzo

    2013-07-01

    Merck consolidated discovery stage bioanalytical functions into the Department of Pharmacokinetics, Pharmacodynamics & Drug Metabolism in 2007. Since then procedures and equipment used to provide important quantitative data to project teams have been harmonized and in many cases standardized. This approach has enabled movement of work across the network of laboratories and has resulted in a lean, flexible and efficient organization. The overall goal was to reduce time and resources spent on routine activities while creating time to perform research in new areas and technologies to support future scientific needs. The current state of discovery bioanalysis at Merck is discussed, including hardware and software platforms, workflow procedures and performance metrics. Examples of improved processes will be discussed for compound tuning, LC method development, analytical acceptance criteria, automated sample preparation, sample analysis platforms, data processing and data reporting.

  9. Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

    NASA Astrophysics Data System (ADS)

    Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

    2007-12-01

    Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access

  10. A query suggestion workflow for life science IR-systems.

    PubMed

    Esch, Maria; Chen, Jinbo; Weise, Stephan; Hassani-Pak, Keywan; Scholz, Uwe; Lange, Matthias

    2014-06-13

    Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.

  11. Images crossing borders: image and workflow sharing on multiple levels.

    PubMed

    Ross, Peeter; Pohjonen, Hanna

    2011-04-01

    Digitalisation of medical data makes it possible to share images and workflows between related parties. In addition to linear data flow where healthcare professionals or patients are the information carriers, a new type of matrix of many-to-many connections is emerging. Implementation of shared workflow brings challenges of interoperability and legal clarity. Sharing images or workflows can be implemented on different levels with different challenges: inside the organisation, between organisations, across country borders, or between healthcare institutions and citizens. Interoperability issues vary according to the level of sharing and are either technical or semantic, including language. Legal uncertainty increases when crossing national borders. Teleradiology is regulated by multiple European Union (EU) directives and legal documents, which makes interpretation of the legal system complex. To achieve wider use of eHealth and teleradiology several strategic documents were published recently by the EU. Despite EU activities, responsibility for organising, providing and funding healthcare systems remains with the Member States. Therefore, the implementation of new solutions requires strong co-operation between radiologists, societies of radiology, healthcare administrators, politicians and relevant EU authorities. The aim of this article is to describe different dimensions of image and workflow sharing and to analyse legal acts concerning teleradiology in the EU.

  12. Content and Workflow Management for Library Websites: Case Studies

    ERIC Educational Resources Information Center

    Yu, Holly, Ed.

    2005-01-01

    Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…

  13. Content and Workflow Management for Library Websites: Case Studies

    ERIC Educational Resources Information Center

    Yu, Holly, Ed.

    2005-01-01

    Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…

  14. A standard-enabled workflow for synthetic biology.

    PubMed

    Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

    2017-06-15

    A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

  15. Flight and Operational Medicine Clinic (FOMC) Workflow Analysis

    DTIC Science & Technology

    2014-03-14

    13 3.2.2 Fly Preventive Health Assessment (Fly PHA...vulnerability to illness, improving injury prevention , and improving return-to-duty after injury. The improved EHR and workflow allows for better case...Class (IFC) Fly Preventive Health Assessment (Fly - PHA) Aeromedical Waiver Profile 469 Duty Limiting Restrictions Occupational Health Medical

  16. Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

    NASA Astrophysics Data System (ADS)

    Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W.

    2016-04-01

    Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

  17. Replication and Robustness in Developmental Research

    ERIC Educational Resources Information Center

    Duncan, Greg J.; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J.

    2014-01-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key…

  18. Replication and Robustness in Developmental Research

    ERIC Educational Resources Information Center

    Duncan, Greg J.; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J.

    2014-01-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key…

  19. Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations.

    PubMed

    Simms, Andrew M; Toofanny, Rudesh D; Kehl, Catherine; Benson, Noah C; Daggett, Valerie

    2008-06-01

    Dynameomics is a project to investigate and catalog the native-state dynamics and thermal unfolding pathways of representatives of all protein folds using solvated molecular dynamics simulations, as described in the preceding paper. Here we introduce the design of the molecular dynamics data warehouse, a scalable, reliable repository that houses simulation data that vastly simplifies management and access. In the succeeding paper, we describe the development of a complementary multidimensional database. A single protein unfolding or native-state simulation can take weeks to months to complete, and produces gigabytes of coordinate and analysis data. Mining information from over 3000 completed simulations is complicated and time-consuming. Even the simplest queries involve writing intricate programs that must be built from low-level file system access primitives and include significant logic to correctly locate and parse data of interest. As a result, programs to answer questions that require data from hundreds of simulations are very difficult to write. Thus, organization and access to simulation data have been major obstacles to the discovery of new knowledge in the Dynameomics project. This repository is used internally and is the foundation of the Dynameomics portal site http://www.dynameomics.org. By organizing simulation data into a scalable, manageable and accessible form, we can begin to address substantial questions that move us closer to solving biomedical and bioengineering problems.

  20. Emergency Medicine Resident Physicians’ Perceptions of Electronic Documentation and Workflow

    PubMed Central

    Neri, P.M.; Redden, L.; Poole, S.; Pozner, C.N.; Horsky, J.; Raja, A.S.; Poon, E.; Schiff, G.

    2015-01-01

    Summary Objective To understand emergency department (ED) physicians’ use of electronic documentation in order to identify usability and workflow considerations for the design of future ED information system (EDIS) physician documentation modules. Methods We invited emergency medicine resident physicians to participate in a mixed methods study using task analysis and qualitative interviews. Participants completed a simulated, standardized patient encounter in a medical simulation center while documenting in the test environment of a currently used EDIS. We recorded the time on task, type and sequence of tasks performed by the participants (including tasks performed in parallel). We then conducted semi-structured interviews with each participant. We analyzed these qualitative data using the constant comparative method to generate themes. Results Eight resident physicians participated. The simulation session averaged 17 minutes and participants spent 11 minutes on average on tasks that included electronic documentation. Participants performed tasks in parallel, such as history taking and electronic documentation. Five of the 8 participants performed a similar workflow sequence during the first part of the session while the remaining three used different workflows. Three themes characterize electronic documentation: (1) physicians report that location and timing of documentation varies based on patient acuity and workload, (2) physicians report a need for features that support improved efficiency; and (3) physicians like viewing available patient data but struggle with integration of the EDIS with other information sources. Conclusion We confirmed that physicians spend much of their time on documentation (65%) during an ED patient visit. Further, we found that resident physicians did not all use the same workflow and approach even when presented with an identical standardized patient scenario. Future EHR design should consider these varied workflows while trying to

  1. a Workflow-Oriented Approach to Propagation Models in Heliophysics

    NASA Astrophysics Data System (ADS)

    Pierantoni, Gabriele; Carley, Eoin P.; Byrne, Jason P.; Perez-Suarez, David; Gallagher, Peter T.

    2014-07-01

    The Sun is responsible for the eruption of billions of tons of plasma andthe generation of near light-speed particles that propagate throughout the solarsystem and beyond. If directed towards Earth, these events can be damaging toour tecnological infrastructure. Hence there is an effort to understand the causeof the eruptive events and how they propagate from Sun to Earth. However, thephysics governing their propagation is not well understood, so there is a need todevelop a theoretical description of their propagation, known as a PropagationModel, in order to predict when they may impact Earth. It is often difficultto define a single propagation model that correctly describes the physics ofsolar eruptive events, and even more difficult to implement models capable ofcatering for all these complexities and to validate them using real observational data. In this paper, we envisage that workflows offer both a theoretical andpractical framerwork for a novel approach to propagation models. We definea mathematical framework that aims at encompassing the different modalitieswith which workflows can be used, and provide a set of generic building blockswritten in the TAVERNA workflow language that users can use to build theirown propagation models. Finally we test both the theoretical model and thecomposite building blocks of the workflow with a real Science Use Case that wasdiscussed during the 4th CDAW (Coordinated Data Analysis Workshop) eventheld by the HELIO project. We show that generic workflow building blocks canbe used to construct a propagation model that succesfully describes the transitof solar eruptive events toward Earth and predict a correct Earth-impact time

  2. Relationship between E-Prescriptions and Community Pharmacy Workflow

    PubMed Central

    Odukoya, Olufunmilola K.; Chui, Michelle A.

    2013-01-01

    Objectives To understand how community pharmacists use electronic prescribing (e-prescribing) technology; and to describe the workflow challenges pharmacy personnel encounter as a result of using e-prescribing technology. Design Cross-sectional qualitative study. Setting Seven community pharmacies in Wisconsin from December 2010 to March 2011 Participants 16 pharmacists and 14 pharmacy technicians (in three chain and four independent pharmacies). Interventions Think-aloud protocol and pharmacy group interviews. Main outcome measures Pharmacy staff description of their use of e-prescribing technology and challenges encountered in their daily workflow related to this technology. Results Two contributing factors were perceived to influence e-prescribing workflow: issues stemming from prescribing or transmitting software, and issues from within the pharmacy. Pharmacies experienced both delays in receiving, and inaccurate e-prescriptions from physician offices. Receiving an overwhelming number of e-prescriptions with inaccurate or unclear information resulted in significant time delays for patients as pharmacists contacted physicians to clarify wrong information. In addition, pharmacy personnel reported that lack of formal training and the disconnect between the way pharmacists verify accuracy and conduct drug utilization review and the presentation of e-prescription information on the computer screen significantly influenced the speed of processing an e-prescription. Conclusion E-prescriptions processing can hinder pharmacy workflow. As the number of e-prescriptions transmitted to pharmacies increases due to legislative mandates; it is essential that the technology that supports e-prescriptions (both on the prescriber and pharmacy operating systems) be redesigned to facilitate pharmacy workflow processes and to prevent unintended consequences, such as increased medication errors, user frustration, and stress. PMID:23229979

  3. Scientific millenarianism

    SciTech Connect

    Weinberg, A.M.

    1997-12-01

    Today, for the first time, scientific concerns are seriously being addressed that span future times--hundreds, even thousands, or more years in the future. One is witnessing what the author calls scientific millenarianism. Are such concerns for the distant future exercises in futility, or are they real issues that, to the everlasting gratitude of future generations, this generation has identified, warned about and even suggested how to cope with in the distant future? Can the four potential catastrophes--bolide impact, CO{sub 2} warming, radioactive wastes and thermonuclear war--be avoided by technical fixes, institutional responses, religion, or by doing nothing? These are the questions addressed in this paper.

  4. Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

    NASA Astrophysics Data System (ADS)

    Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

    2008-12-01

    The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with

  5. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook.

    PubMed

    Stevens, Jean-Luc R; Elver, Marco; Bednar, James A

    2013-01-01

    Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change.

  6. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook

    PubMed Central

    Stevens, Jean-Luc R.; Elver, Marco; Bednar, James A.

    2013-01-01

    Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change. PMID:24416014

  7. WARP (workflow for automated and rapid production): a framework for end-to-end automated digital print workflows

    NASA Astrophysics Data System (ADS)

    Joshi, Parag

    2006-02-01

    Publishing industry is experiencing a major paradigm shift with the advent of digital publishing technologies. A large number of components in the publishing and print production workflow are transformed in this shift. However, the process as a whole requires a great deal of human intervention for decision making and for resolving exceptions during job execution. Furthermore, a majority of the best-of-breed applications for publishing and print production are intrinsically designed and developed to be driven by humans. Thus, the human-intensive nature of the current prepress process accounts for a very significant amount of the overhead costs in fulfillment of jobs on press. It is a challenge to automate the functionality of applications built with the model of human driven exectution. Another challenge is to orchestrate various components in the publishing and print production pipeline such that they work in a seamless manner to enable the system to perform automatic detection of potential failures and take corrective actions in a proactive manner. Thus, there is a great need for a coherent and unifying workflow architecture that streamlines the process and automates it as a whole in order to create an end-to-end digital automated print production workflow that does not involve any human intervention. This paper describes an architecture and building blocks that lay the foundation for a plurality of automated print production workflows.

  8. Image Classification Workflow Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.

    2016-12-01

    Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.

  9. Scientific Satellites

    DTIC Science & Technology

    1967-01-01

    1919 paper (ref. 9), in which he suggested a Moon rocket. Rock- etry was on a par with extrasensory perception in those days. 38 SCIENTIFIC SA&TLLITES...this way, images of sky can be taken at different wavelengths. The perceptive reader will note that the two zodiacal-light ex- periments described

  10. Scientific Documentation.

    ERIC Educational Resources Information Center

    Pieper, Gail W.

    1980-01-01

    Describes how scientific documentation is taught in three 50-minute sessions in a technical writing course. Tells how session one distinguishes between in-text notes, footnotes, and reference entries; session two discusses the author-year system of citing references; and session three is concerned with the author-number system of reference…

  11. A Quantitative Proteomic Workflow for Characterization of Frozen Clinical Biopsies: Laser Capture Microdissection Coupled with Label-Free Mass Spectrometry

    PubMed Central

    Shapiro, John P.; Biswas, Sabyasachi; Merchant, Anand S.; Satoskar, Anjali; Taslim, Cenny; Lin, Shili; Rovin, Brad H.; Sen, Chandan K.; Roy, Sashwati; Freitas, Michael A.

    2013-01-01

    This paper describes a simple, highly efficient and robust proteomic workflow for routine liquid-chromatography tandem mass spectrometry analysis of Laser Microdissection Pressure Catapulting (LMPC) isolates. Highly efficient protein recovery was achieved by optimization of a “one-pot” protein extraction and digestion allowing for reproducible proteomic analysis on as few as 500 LMPC isolated cells. The method was combined with label-free spectral count quantitation to characterize proteomic differences from 3,000–10,000 LMPC isolated cells. Significance analysis of spectral count data was accomplished using the edgeR tag-count R package combined with hierarchical cluster analysis. To illustrate the capability of this robust workflow, two examples are presented: 1) analysis of keratinocytes from human punch biopsies of normal skin and a chronic diabetic wound and 2) comparison of glomeruli from needle biopsies of patients with kidney disease. Differentially expressed proteins were validated by use of immunohistochemistry. These examples illustrate that tissue proteomics carried out on limited clinical material can obtain informative proteomic signatures for disease pathogenesis and demonstrate the suitability of this approach for biomarker discovery. PMID:23022584

  12. A quantitative proteomic workflow for characterization of frozen clinical biopsies: laser capture microdissection coupled with label-free mass spectrometry.

    PubMed

    Shapiro, John P; Biswas, Sabyasachi; Merchant, Anand S; Satoskar, Anjali; Taslim, Cenny; Lin, Shili; Rovin, Brad H; Sen, Chandan K; Roy, Sashwati; Freitas, Michael A

    2012-12-21

    This paper describes a simple, highly efficient and robust proteomic workflow for routine liquid-chromatography tandem mass spectrometry analysis of Laser Microdissection Pressure Catapulting (LMPC) isolates. Highly efficient protein recovery was achieved by optimization of a "one-pot" protein extraction and digestion allowing for reproducible proteomic analysis on as few as 500 LMPC isolated cells. The method was combined with label-free spectral count quantitation to characterize proteomic differences from 3000-10,000 LMPC isolated cells. Significance analysis of spectral count data was accomplished using the edgeR tag-count R package combined with hierarchical cluster analysis. To illustrate the capability of this robust workflow, two examples are presented: 1) analysis of keratinocytes from human punch biopsies of normal skin and a chronic diabetic wound and 2) comparison of glomeruli from needle biopsies of patients with kidney disease. Differentially expressed proteins were validated by use of immunohistochemistry. These examples illustrate that tissue proteomics carried out on limited clinical material can obtain informative proteomic signatures for disease pathogenesis and demonstrate the suitability of this approach for biomarker discovery.

  13. Workflow development for targeted lipidomic quantification using parallel reaction monitoring on a quadrupole-time of flight mass spectrometry.

    PubMed

    Zhou, Juntuo; Liu, Chunlei; Si, Dandan; Jia, Bing; Zhong, Lijun; Yin, Yuxin

    2017-06-15

    Advances in high-resolution mass spectrometers with faster scanning capabilities and higher sensitivities have expanded these instruments' functionality beyond traditional data-dependent acquisition in targeted metabolomics. Apart from the traditional multiple reaction monitoring strategy, the parallel reaction monitoring (PRM) strategy is also used for targeted metabolomics quantification. The high resolution and mass accuracy of full-scan (MS1) and tandem mass spectrometry (MS/MS) scan result in sufficient selectivity by monitoring all MS/MS fragment ions for each target precursor and simultaneously providing flexibility in assay method construction and post-acquisition data analysis. In this study, using an orthogonal quadrupole-time of flight liquid chromatography-mass spectrometry system (QTOF LC-MS), we investigated the applicability of a large-scale targeted lipidomic assay using scheduled PRM. This method monitored 222 lipids belonging to 15 lipid species in serum. Robustness, reproducibility, and quantitative performance were assessed using chemical standards and serum samples. Finally, we demonstrated the application of this PRM-based targeted metabolomic workflow to systemic lupus erythematosus, a severe autoimmunological disease. Results showed that 63 lipids belonging to 11 lipid species were significantly changed. In summary, at the first time, a robust targeted lipidomic workflow was established using PRM acquisition strategy on a Q-TOF platform, providing another powerful tool for targeted metabolomic analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Workflow Dynamics and the Imaging Value Chain: Quantifying the Effect of Designating a Nonimage-Interpretive Task Workflow.

    PubMed

    Lee, Matthew H; Schemmel, Andrew J; Pooler, B Dustin; Hanley, Taylor; Kennedy, Tabassum A; Field, Aaron S; Wiegmann, Douglas; Yu, John-Paul J

    To assess the impact of separate non-image interpretive task and image-interpretive task workflows in an academic neuroradiology practice. A prospective, randomized, observational investigation of a centralized academic neuroradiology reading room was performed. The primary reading room fellow was observed over a one-month period using a time-and-motion methodology, recording frequency and duration of tasks performed. Tasks were categorized into separate image interpretive and non-image interpretive workflows. Post-intervention observation of the primary fellow was repeated following the implementation of a consult assistant responsible for non-image interpretive tasks. Pre- and post-intervention data were compared. Following separation of image-interpretive and non-image interpretive workflows, time spent on image-interpretive tasks by the primary fellow increased from 53.8% to 73.2% while non-image interpretive tasks decreased from 20.4% to 4.4%. Mean time duration of image interpretation nearly doubled, from 05:44 to 11:01 (p = 0.002). Decreases in specific non-image interpretive tasks, including phone calls/paging (2.86/hr versus 0.80/hr), in-room consultations (1.36/hr versus 0.80/hr), and protocoling (0.99/hr versus 0.10/hr), were observed. The consult assistant experienced 29.4 task switching events per hour. Rates of specific non-image interpretive tasks for the CA were 6.41/hr for phone calls/paging, 3.60/hr for in-room consultations, and 3.83/hr for protocoling. Separating responsibilities into NIT and IIT workflows substantially increased image interpretation time and decreased TSEs for the primary fellow. Consolidation of NITs into a separate workflow may allow for more efficient task completion. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Integrated petrophysical and reservoir characterization workflow to enhance permeability and water saturation prediction

    NASA Astrophysics Data System (ADS)

    Al-Amri, Meshal; Mahmoud, Mohamed; Elkatatny, Salaheldin; Al-Yousef, Hasan; Al-Ghamdi, Tariq

    2017-07-01

    Accurate estimation of permeability is essential in reservoir characterization and in determining fluid flow in porous media which greatly assists optimize the production of a field. Some of the permeability prediction techniques such as Porosity-Permeability transforms and recently artificial intelligence and neural networks are encouraging but still show moderate to good match to core data. This could be due to limitation to homogenous media while the knowledge about geology and heterogeneity is indirectly related or absent. The use of geological information from core description as in Lithofacies which includes digenetic information show a link to permeability when categorized into rock types exposed to similar depositional environment. The objective of this paper is to develop a robust combined workflow integrating geology and petrophysics and wireline logs in an extremely heterogeneous carbonate reservoir to accurately predict permeability. Permeability prediction is carried out using pattern recognition algorithm called multi-resolution graph-based clustering (MRGC). We will bench mark the prediction results with hard data from core and well test analysis. As a result, we showed how much better improvements are achieved in the permeability prediction when geology is integrated within the analysis. Finally, we use the predicted permeability as an input parameter in J-function and correct for uncertainties in saturation calculation produced by wireline logs using the classical Archie equation. Eventually, high level of confidence in hydrocarbon volumes estimation is reached when robust permeability and saturation height functions are estimated in presence of important geological details that are petrophysically meaningful.

  16. Robust Adaptive Control

    NASA Technical Reports Server (NTRS)

    Narendra, K. S.; Annaswamy, A. M.

    1985-01-01

    Several concepts and results in robust adaptive control are are discussed and is organized in three parts. The first part surveys existing algorithms. Different formulations of the problem and theoretical solutions that have been suggested are reviewed here. The second part contains new results related to the role of persistent excitation in robust adaptive systems and the use of hybrid control to improve robustness. In the third part promising new areas for future research are suggested which combine different approaches currently known.

  17. Scientific Claims versus Scientific Knowledge.

    ERIC Educational Resources Information Center

    Ramsey, John

    1991-01-01

    Provides activities that help students to understand the importance of the scientific method. The activities include the science of fusion and cold fusion; a group activity that analyzes and interprets the events surrounding cold fusion; and an application research project concerning a current science issue. (ZWH)

  18. Scientific Claims versus Scientific Knowledge.

    ERIC Educational Resources Information Center

    Ramsey, John

    1991-01-01

    Provides activities that help students to understand the importance of the scientific method. The activities include the science of fusion and cold fusion; a group activity that analyzes and interprets the events surrounding cold fusion; and an application research project concerning a current science issue. (ZWH)

  19. Scientific Misconduct

    NASA Astrophysics Data System (ADS)

    Moore, John W.

    2002-12-01

    These cases provide a good basis for discussions of scientific ethics, particularly with respect to the responsibilities of colleagues in collaborative projects. With increasing numbers of students working in cooperative or collaborative groups, there may be opportunities for more than just discussion—similar issues of responsibility apply to the members of such groups. Further, this is an area where, “no clear, widely accepted standards of behavior exist” (1). Thus there is an opportunity to point out to students that scientific ethics, like science itself, is incomplete and needs constant attention to issues that result from new paradigms such as collaborative research. Finally, each of us can resolve to pay more attention to the contributions we and our colleagues make to collaborative projects, applying to our own work no less critical an eye than we would cast on the work of those we don’t know at all.

  20. SU-E-J-78: Adaptive Planning Workflow in a Pencil Beam Scanning Proton Therapy Center

    SciTech Connect

    Blakey, M; Price, S; Robison, B; Niek, S; Moe, S; Renegar, J; Mark, A; Spenser, W

    2015-06-15

    Purpose: The susceptibility of proton therapy to changes in patient setup and anatomy necessitates an adaptive planning process. With the right planning tools and clinical workflow, an adaptive plan can be created in a timely manner without adding significant workload to the treatment planning staff. Methods: In our center, a weekly QA CT is performed on most patients to assess setup, anatomy change, and tumor response. The QA CT is fused to the treatment planning CT, the contours are transferred via deformable registration, and the plan dose is recalculated on the QA CT. A physicist assesses the dose distribution, and an adaptive plan is requested based on tumor coverage or OAR dose changes. After the physician confirms or alters the deformed contours, a dosimetrist develops an adaptive plan using our TPS adaptation module. The plan is assessed for robustness and is then reviewed by the physician. Patient QA is performed within three days following the first adapted treatment. Results: Of the patients who received QA CTs, 19% required at least one adaptive plan (18.5% H&N, 18.5% brain, 11.1% breast, 14.8% chestwall, 14.8% lung, 18.5% pelvis and 3.8% abdomen). Of these patients, 14% went on a break, while the remainder was treated with the previous plan during the re-planning process. Adaptive plans were performed based on tumor shrinkage, anatomy change or positioning uncertainties for 37.9%, 44.8%, and 17.3% of the patients, respectively. On average, 3 full days are required between the QA CT and the first adapted plan treatment. Conclusion: Adaptive planning is a crucial component of proton therapy and should be applied to any site when the QA CT shows significant deviation from the plan. With an efficient workflow, an adaptive plan can be applied without delaying patient treatment or burdening the dosimetry and medical physics team.

  1. Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

    PubMed

    Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

    2012-01-01

    Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.

  2. Jflow: a workflow management system for web applications.

    PubMed

    Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

    2016-02-01

    Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. A computational workflow for designing silicon donor qubits

    NASA Astrophysics Data System (ADS)

    Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; Huang, Jingsong; Britton, Charles; Curtis, Franklin G.; Dumitrescu, Eugene F.; Mohiyaddin, Fahd A.; Sumpter, Bobby G.

    2016-10-01

    Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to perform detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. The resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.

  4. Accelerating in silico research with workflows: a lesson in Simplicity.

    PubMed

    Walsh, Paul; Carroll, John; Sleator, Roy D

    2013-12-01

    Bioinformatics is the application of computer science and related disciplines to the field of molecular biology. While there are currently several web based and desktop tools available for biologists to perform routine bioinformatics tasks, these tools often require users to manually and repeatedly co-ordinate multiple applications before reaching a result. In an effort to reduce time and error, workflow tools have been developed to automate these tasks. However, many of these tools require expert knowledge of the techniques and supporting databases which more often than not lies outside the scope of most biologists. Herein, we describe the development of sequence information management platform (Simplicity), a workflow-based bioinformatics management tool, which allows non-bioinformaticians to rapidly annotate large amounts of DNA and protein sequence data. © 2013 Published by Elsevier Ltd.

  5. Patient recruitment workflow with and without a patient recruitment system.

    PubMed

    Trinczek, Benjamin; Schulte, Britta; Breil, Bernhard; Dugas, Martin

    2013-01-01

    In clinical trials (CTs), the process of patient recruitment (PR) is one of the main risk factors, as almost half of all trial delays are caused by problems in PR. To our knowledge, no publication in this field describes the process of PR. Therefore, weak spots and potential benefits cannot be identified. By interviewing six domain experts and modeling the workflow in a standardized way, we describe the actors, tasks and tools within PR. We compare the current workflow with Patient Recruitment System (PRS)-supported PR. The identification of eligible participants is the most complex part, but adding a PRS simplifies it by automating repetitive tasks and taking work off the Investigators' hands. This work contributes to a common understanding of the PR process.

  6. A computational workflow for designing silicon donor qubits

    SciTech Connect

    Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; Huang, Jingsong; Britton, Charles; Curtis, Franklin G.; Dumitrescu, Eugene F.; Mohiyaddin, Fahd A.; Sumpter, Bobby G.

    2016-09-19

    Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to perform detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. In conclusion, the resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.

  7. CONNJUR Workflow Builder: A software integration environment for spectral reconstruction

    PubMed Central

    Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O.; Ellis, Heidi J.C.; Gryk, Michael R.

    2015-01-01

    CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and