Science.gov

Sample records for robust scientific workflows

  1. Structured Composition of Dataflow and Control-Flow for Reusable and Robust Scientific Workflows

    SciTech Connect

    Bowers, S; Ludaescher, B; Ngu, A; Critchlow, T

    2005-09-07

    Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, some workflow tasks are particularly ''control-flow intensive'', e.g., procedures to make workflows more fault-tolerant and adaptive in an unreliable, distributed computing environment. Modeling complex control-flow directly within a dataflow framework often leads to overly complicated workflows that are hard to comprehend, reuse, schedule, and maintain. In this paper, we develop a framework that allows a structured embedding of control-flow intensive subtasks within dataflow process networks. In this way, we can seamlessly handle complex control-flows without sacrificing the benefits of dataflow. We build upon a flexible actor-oriented modeling and design approach and extend it with (actor) frames and (workflow) templates. A frame is a placeholder for an (existing or planned) collection of components with similar function and signature. A template partially specifies the behavior of a subworkflow by leaving ''holes'' (i.e., frames) in the subworkflow definition. Taken together, these abstraction mechanisms facilitate the separation and structured re-combination of control-flow and dataflow in scientific workflow applications. We illustrate our approach with a real-world scientific workflow from the astrophysics domain. This data-intensive workflow requires remote execution and file transfer in a semi-reliable environment. For such work-flows, we propose a 3-layered architecture: The top-level, typically a dataflow process network, includes Generic Data Transfer (GDT) frames and Generic remote eXecution (GX) frames. At the second level, the user can specialize the behavior of these generic components by embedding a suitable template (here: transducer templates for control-flow intensive tasks). At the third level, frames inside the transducer template are specialized by embedding

  2. Scientific Workflows in Astronomy

    NASA Astrophysics Data System (ADS)

    Schaaff, A.; Verdes-Montenegro, L.; Ruiz, J. E.; Santander-Vela, J.

    2012-09-01

    We will soon be facing a new generation of facilities and archives dealing with huge amounts of data (ALMA, LSST, Pan-Starrs, LOFAR, SKA pathfinders,…) where scientific workflows will play an important role in the working methodology of astronomers. While the traditional pipelines tend to produce exploitable products, scientific workflows are aimed at producing scientific insight. Virtual Observatory standards provide the tools to design reproducible scientific workflows. A detailed analysis about the state of the art of workflows involves languages, design tools, execution engines, use cases, etc. A major topic is also the preservation of the workflows and the capability to replay a workflow several years after its design and implementation. Discussions on these topics are being held recently in IVOA forums and are part of the work that is being done in the Wf4Ever project. The purpose of the BoF was to present to the community the work in progress at the IVOA, collect ideas and identify needs not yet addressed.

  3. Scientific Workflows in the Cloud

    NASA Astrophysics Data System (ADS)

    Juve, Gideon; Deelman, Ewa

    The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider the impact of cloud computing on scientific workflow applications. We examine the benefits and drawbacks of cloud computing for workflows, and argue that the primary benefit of cloud computing is not the economic model it promotes, but rather the technologies it employs and how they enable new features for workflow applications. We describe how clouds can be configured to execute workflow tasks and present a case study that examines the performance and cost of three typical workflow applications on Amazon EC2. Finally, we identify several areas in which existing clouds can be improved and discuss the future of workflows in the cloud.

  4. Managing and Documenting Legacy Scientific Workflows.

    PubMed

    Acuña, Ruben; Chomilier, Jacques; Lacroix, Zoé

    2015-01-01

    Scientific legacy workflows are often developed over many years, poorly documented and implemented with scripting languages. In the context of our cross-disciplinary projects we face the problem of maintaining such scientific workflows. This paper presents the Workflow Instrumentation for Structure Extraction (WISE) method used to process several ad-hoc legacy workflows written in Python and automatically produce their workflow structural skeleton. Unlike many existing methods, WISE does not assume input workflows to be preprocessed in a known workflow formalism. It is also able to identify and analyze calls to external tools. We present the method and report its results on several scientific workflows. PMID:26673793

  5. Managing and Documenting Legacy Scientific Workflows.

    PubMed

    Acuña, Ruben; Chomilier, Jacques; Lacroix, Zoé

    2015-10-06

    Scientific legacy workflows are often developed over many years, poorly documented and implemented with scripting languages. In the context of our cross-disciplinary projects we face the problem of maintaining such scientific workflows. This paper presents the Workflow Instrumentation for Structure Extraction (WISE) method used to process several ad-hoc legacy workflows written in Python and automatically produce their workflow structural skeleton. Unlike many existing methods, WISE does not assume input workflows to be preprocessed in a known workflow formalism. It is also able to identify and analyze calls to external tools. We present the method and report its results on several scientific workflows.

  6. Working with Workflows: Highlights from 5 years Building Scientific Workflows

    SciTech Connect

    Critchlow, Terence J.; Altintas, Ilkay; Chin, George; Crawl, Daniel; Iyer, H.; Khan, Ayla; Klasky, S.; Koehler, Sven; Ludaescher, Bertram T.; Mouallem, Pierre; Nagappan, Mie; Podhorszki, Norbert; Shoshani, Arie; Silva, C.; Tchoua, Roselynne; Vouk, M.

    2011-07-30

    In 2006, the SciDAC Scientific Data Management (SDM) Center proposed to continue its work deploying leading edge data management and analysis capabilities to scientific applications. One of three thrust areas within the proposed center was focused on Scientific Process Automation (SPA) using workflow technology. As a founding member of the Kepler consortium [LAB+09], the SDM Center team was well positioned to begin deploying workflows immediately. We were also keenly aware of some of the deficiencies in Kepler when applied to high performance computing workflows, which allowed us to focus our research and development efforts on critical new capabilities which were ultimately integrated into the Kepler open source distribution, benefiting the entire community. Significant work was required to ensure Kepler was capable of supporting large-scale production runs for SciDAC applications. Our work on generic actors and templates have improved the portability of workflows across machines and provided a higher level of abstraction for workflow developers. Fault tolerance and provenance tracking were obvious areas for improvement within Kepler given the longevity and complexity of our target workflows. To monitor workflow execution, we developed and deployed a web-based dashboard. We then generalized this interface and released it so it could be deployed at other locations. Outreach has always been a primary focus of our work and we had many successful deployments across a number of scientific domains while continually publishing and presenting our work. This short paper describes our most significant accomplishments over the past 5 years. Additional information about the SDM Center can be found in the companion paper: The Scientific Data Management Center: Available Technologies and Highlights.

  7. Supercomputing and Scientific Workflows Gaps and Requirements

    SciTech Connect

    Critchlow, Terence J.; Chin, George

    2011-09-08

    Over the past decade, workflows have been successfully applied to a number of scientific domains with great success. Workflow engines are now commonly used across scientific disciplines to automate mundane tasks, collect provenance, and orchestrate complex processes. However, workflows have not yet made significant strides managing fine-grain, concurrent tasks directly on supercomputing platforms. As scientific computing becomes an increasingly important discovery method and high performance computing environments become more complex, addressing this gap becomes critical. Using a simple use case as motivation, this paper describes the current barriers to using workflow engines in a supercomputing environment and outlines the new capabilities that must be provided if workflows are to be successfully applied in this context.

  8. Automation of Network-Based Scientific Workflows

    SciTech Connect

    Altintas, I.; Barreto, R.; Blondin, J. M.; Cheng, Z.; Critchlow, T.; Khan, A.; Klasky, Scott A; Ligon, J.; Ludaescher, B.; Mouallem, P. A.; Parker, S.; Podhorszki, Norbert; Shoshani, A.; Silva, C.; Vouk, M. A.

    2007-01-01

    Comprehensive, end-to-end, data and workflow management solutions are needed to handle the increasing complexity of processes and data volumes associated with modern distributed scientific problem solving, such as ultra-scale simulations and high-throughput experiments. The key to the solution is an integrated network-based framework that is functional, dependable, fault-tolerant, and supports data and process provenance. Such a framework needs to make development and use of application workflows dramatically easier so that scientists' efforts can shift away from data management and utility software development to scientific research and discovery An integrated view of these activities is provided by the notion of scientific workflows - a series of structured activities and computations that arise in scientific problem-solving. An information technology framework that supports scientific workflows is the Ptolemy II based environment called Kepler. This paper discusses the issues associated with practical automation of scientific processes and workflows and illustrates this with workflows developed using the Kepler framework and tools.

  9. Accelerating the scientific exploration process with scientific workflows

    NASA Astrophysics Data System (ADS)

    Altintas, Ilkay; Barney, Oscar; Cheng, Zhengang; Critchlow, Terence; Ludaescher, Bertram; Parker, Steve; Shoshani, Arie; Vouk, Mladen

    2006-09-01

    Although an increasing amount of middleware has emerged in the last few years to achieve remote data access, distributed job execution, and data management, orchestrating these technologies with minimal overhead still remains a difficult task for scientists. Scientific workflow systems improve this situation by creating interfaces to a variety of technologies and automating the execution and monitoring of the workflows. Workflow systems provide domain-independent customizable interfaces and tools that combine different tools and technologies along with efficient methods for using them. As simulations and experiments move into the petascale regime, the orchestration of long running data and compute intensive tasks is becoming a major requirement for the successful steering and completion of scientific investigations. A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Kepler is a cross-project collaboration, co-founded by the SciDAC Scientific Data Management (SDM) Center, whose purpose is to develop a domain-independent scientific workflow system. It provides a workflow environment in which scientists design and execute scientific workflows by specifying the desired sequence of computational actions and the appropriate data flow, including required data transformations, between these steps. Currently deployed workflows range from local analytical pipelines to distributed, high-performance and high-throughput applications, which can be both data- and compute-intensive. The scientific workflow approach offers a number of advantages over traditional scripting-based approaches, including ease of configuration, improved reusability and maintenance of workflows and components (called actors), automated provenance management, ''smart'' re-running of different versions of workflow instances, on-the-fly updateable parameters

  10. Kepler Scientific Workflow Design and Execution with Contexts

    SciTech Connect

    Ngu, Anne Hee Hiong; Jamnagarwala, Arwa; Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

    2011-09-01

    A context-aware scientific workflow is a typical scientific workflow that is enhanced with context binding and awareness mechanisms. Context facilitates further configuration of the scientific workflow at runtime such that it is tuned to its environment during execution and responds intelligently based on such awareness without customized coding of the workflow. In this paper, we present a context annotation framework, which supports rapid development of context-aware scientific workflows. Context annotation enables a diverse type of actor in Kepler that may bind with different sensed environmental information as part of the actor’s regular data. Context-aware actors simplify the construction of scientific workflows that require intricate knowledge in initializing and configuring a large number of parameters to cover all different execution conditions. This paper presents the motivation, system design, implementation, and usage of context annotation in relation to the Kepler scientific workflow system.

  11. Scientific Workflows Composition and Deployment on SOA Frameworks

    SciTech Connect

    Liu, Yan; Gorton, Ian; Wynne, Adam S.; Kulkarni, Anand V.

    2011-12-12

    Scientific workflows normally consist of multiple applications acquiring and transforming data, running data intensive analyses and visualizing the results for scientific discovery. To compose and deploy such scientific workflows, an SOA platform can provide integration of third-party components, services, and tools. In this paper, we present our application of Service-Oriented Architecture (SOA) to compose and deploy systems biology workflows. In developing this application, our solution uses MeDICi a middleware framework built on SOA platforms as an integration layer. We discuss our experience and lessons learnt about this solution that are generally applicable to scientific workflows in other domains.

  12. Scientific workflows as productivity tools for drug discovery.

    PubMed

    Shon, John; Ohkawa, Hitomi; Hammer, Juergen

    2008-05-01

    Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.

  13. Comparison of Resource Platform Selection Approaches for Scientific Workflows

    SciTech Connect

    Simmhan, Yogesh; Ramakrishnan, Lavanya

    2010-03-05

    Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. At the same time, it can eliminate the challenges of restricted software environments and queue delays in shared high performance computing environments. Choosing from these diverse resource platforms for a workflow execution poses a challenge for many scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows. While many workflow planning methods have explored task scheduling onto different resources, these methods often require fine-scale characterization of the workflow that is onerous for a scientist. In this position paper, we describe our early exploratory work into using blackbox characteristics to do a cost-benefit analysis across of using cloud platforms. We use only very limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism. The data size characterizes the IO requirements. We compare the effectiveness of this approach to other resource selection models using two exemplar scientific workflows scheduled on desktops, local clusters, HPC centers, and clouds. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of cloud computing resources even before porting an existing workflow.

  14. A Multi-Dimensional Classification Model for Scientific Workflow Characteristics

    SciTech Connect

    Ramakrishnan, Lavanya; Plale, Beth

    2010-04-05

    Workflows have been used to model repeatable tasks or operations in manufacturing, business process, and software. In recent years, workflows are increasingly used for orchestration of science discovery tasks that use distributed resources and web services environments through resource models such as grid and cloud computing. Workflows have disparate re uirements and constraints that affects how they might be managed in distributed environments. In this paper, we present a multi-dimensional classification model illustrated by workflow examples obtained through a survey of scientists from different domains including bioinformatics and biomedical, weather and ocean modeling, astronomy detailing their data and computational requirements. The survey results and classification model contribute to the high level understandingof scientific workflows.

  15. A scientific workflow framework for (13)C metabolic flux analysis.

    PubMed

    Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

    2016-08-20

    Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases.

  16. A scientific workflow framework for (13)C metabolic flux analysis.

    PubMed

    Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

    2016-08-20

    Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. PMID:26721184

  17. Scientific Workflows + Provenance = Better (Meta-)Data Management

    NASA Astrophysics Data System (ADS)

    Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

    2013-12-01

    The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata

  18. Science Gateways, Scientific Workflows and Open Community Software

    NASA Astrophysics Data System (ADS)

    Pierce, M. E.; Marru, S.

    2014-12-01

    Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing

  19. Enabling scientific workflows in virtual reality

    USGS Publications Warehouse

    Kreylos, O.; Bawden, G.; Bernardin, T.; Billen, M.I.; Cowgill, E.S.; Gold, R.D.; Hamann, B.; Jadamec, M.; Kellogg, L.H.; Staadt, O.G.; Sumner, D.Y.

    2006-01-01

    To advance research and improve the scientific return on data collection and interpretation efforts in the geosciences, we have developed methods of interactive visualization, with a special focus on immersive virtual reality (VR) environments. Earth sciences employ a strongly visual approach to the measurement and analysis of geologic data due to the spatial and temporal scales over which such data ranges, As observations and simulations increase in size and complexity, the Earth sciences are challenged to manage and interpret increasing amounts of data. Reaping the full intellectual benefits of immersive VR requires us to tailor exploratory approaches to scientific problems. These applications build on the visualization method's strengths, using both 3D perception and interaction with data and models, to take advantage of the skills and training of the geological scientists exploring their data in the VR environment. This interactive approach has enabled us to develop a suite of tools that are adaptable to a range of problems in the geosciences and beyond. Copyright ?? 2008 by the Association for Computing Machinery, Inc.

  20. The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)

    NASA Astrophysics Data System (ADS)

    Stephan, E.

    2010-12-01

    The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual

  1. Web-Accessible Scientific Workflow System for Performance Monitoring

    SciTech Connect

    Roelof Versteeg; Roelof Versteeg; Trevor Rowe

    2006-03-01

    We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoring systems with different degrees of complexity.

  2. Building Scientific Workflows for the Geosciences with Open Community Software

    NASA Astrophysics Data System (ADS)

    Pierce, M. E.; Marru, S.; Weerawarana, S. M.

    2012-12-01

    We describe the design and development of the Apache Airavata scientific workflow software and its application to problems in geosciences. Airavata is based on Service Oriented Architecture principles and is developed as general purpose software for managing large-scale science applications on supercomputing resources such as the NSF's XSEDE. Based on the NSF-funded EarthCube Workflow Working Group activities, we discuss the application of this software relative to specific requirements (such as data stream data processing, event triggering, dealing with large data sets, and advanced distributed execution patterns involved in data mining). We also consider the role of governance in EarthCube software development and present the development of Airavata software through the Apache Software Foundation's community development model. We discuss the potential impacts on software accountability and sustainability using this model.

  3. Scientific Workflows and the Sensor Web for Virtual Environmental Observatories

    NASA Astrophysics Data System (ADS)

    Simonis, I.; Vahed, A.

    2008-12-01

    interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.

  4. An Adaptable Seismic Data Format for Modern Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.

    2013-12-01

    Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.

  5. Facilitating Stewardship of scientific data through standards based workflows

    NASA Astrophysics Data System (ADS)

    Bastrakova, I.; Kemp, C.; Potter, A. K.

    2013-12-01

    scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.

  6. WRF4SG: A Scientific Gateway for climate experiment workflows

    NASA Astrophysics Data System (ADS)

    Blanco, Carlos; Cofino, Antonio S.; Fernandez-Quiruelas, Valvanuz

    2013-04-01

    The Weather Research and Forecasting model (WRF) is a community-driven and public domain model widely used by the weather and climate communities. As opposite to other application-oriented models, WRF provides a flexible and computationally-efficient framework which allows solving a variety of problems for different time-scales, from weather forecast to climate change projection. Furthermore, WRF is also widely used as a research tool in modeling physics, dynamics, and data assimilation by the research community. Climate experiment workflows based on Weather Research and Forecasting (WRF) are nowadays among the one of the most cutting-edge applications. These workflows are complex due to both large storage and the huge number of simulations executed. In order to manage that, we have developed a scientific gateway (SG) called WRF for Scientific Gateway (WRF4SG) based on WS-PGRADE/gUSE and WRF4G frameworks to ease achieve WRF users needs (see [1] and [2]). WRF4SG provides services for different use cases that describe the different interactions between WRF users and the WRF4SG interface in order to show how to run a climate experiment. As WS-PGRADE/gUSE uses portlets (see [1]) to interact with users, its portlets will support these use cases. A typical experiment to be carried on by a WRF user will consist on a high-resolution regional re-forecast. These re-forecasts are common experiments used as input data form wind power energy and natural hazards (wind and precipitation fields). In the cases below, the user is able to access to different resources such as Grid due to the fact that WRF needs a huge amount of computing resources in order to generate useful simulations: * Resource configuration and user authentication: The first step is to authenticate on users' Grid resources by virtual organizations. After login, the user is able to select which virtual organization is going to be used by the experiment. * Data assimilation: In order to assimilate the data sources

  7. On the support of scientific workflows over Pub/Sub brokers.

    PubMed

    Morales, Augusto; Robles, Tomas; Alcarria, Ramon; Cedeño, Edwin

    2013-08-20

    The execution of scientific workflows is gaining importance as more computing resources are available in the form of grid environments. The Publish/Subscribe paradigm offers well-proven solutions for sustaining distributed scenarios while maintaining the high level of task decoupling required by scientific workflows. In this paper, we propose a new model for supporting scientific workflows that improves the dissemination of control events. The proposed solution is based on the mapping of workflow tasks to the underlying Pub/Sub event layer, and the definition of interfaces and procedures for execution on brokers. In this paper we also analyze the strengths and weaknesses of current solutions that are based on existing message exchange models for scientific workflows. Finally, we explain how our model improves the information dissemination, event filtering, task decoupling and the monitoring of scientific workflows.

  8. On the Support of Scientific Workflows over Pub/Sub Brokers

    PubMed Central

    Morales, Augusto; Robles, Tomas; Alcarria, Ramon; Cedeño, Edwin

    2013-01-01

    The execution of scientific workflows is gaining importance as more computing resources are available in the form of grid environments. The Publish/Subscribe paradigm offers well-proven solutions for sustaining distributed scenarios while maintaining the high level of task decoupling required by scientific workflows. In this paper, we propose a new model for supporting scientific workflows that improves the dissemination of control events. The proposed solution is based on the mapping of workflow tasks to the underlying Pub/Sub event layer, and the definition of interfaces and procedures for execution on brokers. In this paper we also analyze the strengths and weaknesses of current solutions that are based on existing message exchange models for scientific workflows. Finally, we explain how our model improves the information dissemination, event filtering, task decoupling and the monitoring of scientific workflows. PMID:23966191

  9. Looking beneath the Edges and Nodes: Ranking and Mining Scientific Workflows

    ERIC Educational Resources Information Center

    Dong, Xiao

    2010-01-01

    Workflow technology has emerged as an eminent way to support scientific computing nowadays. Supported by mature technological infrastructures such as web services and high performance computing infrastructure, workflow technology has been well adopted by scientific community as it offers an effective framework to prototype, modify and manage…

  10. Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

    DOE PAGESBeta

    Malawski, Maciej; Figiela, Kamil; Bubak, Marian; Deelman, Ewa; Nabrzyski, Jarek

    2015-01-01

    This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less

  11. Automating adjoint wave-equation travel-time tomography using scientific workflow

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaofeng; Chen, Po; Pullammanappallil, Satish

    2013-10-01

    Recent advances in commodity high-performance computing technology have dramatically reduced the computational cost for solving the seismic wave equation in complex earth structure models. As a consequence, wave-equation-based seismic tomography techniques are being actively developed and gradually adopted in routine subsurface seismic imaging practices. Wave-equation travel-time tomography is a seismic tomography technique that inverts cross-correlation travel-time misfits using full-wave Fréchet kernels computed by solving the wave equation. This technique can be implemented very efficiently using the adjoint method, in which the misfits are back-propagated from the receivers (i.e., seismometers) to produce the adjoint wave-field and the interaction between the adjoint wave-field and the forward wave-field from the seismic source gives the gradient of the objective function. Once the gradient is available, a gradient-based optimization algorithm can then be adopted to produce an optimal earth structure model that minimizes the objective function. This methodology is conceptually straightforward, but its implementation in practical situations is highly complex, error-prone and computationally demanding. In this study, we demonstrate the feasibility of automating wave-equation travel-time tomography based on the adjoint method using Kepler, an open-source software package for designing, managing and executing scientific workflows. The workflow technology allows us to abstract away much of the complexity involved in the implementation in a manner that is both robust and scalable. Our automated adjoint wave-equation travel-time tomography package has been successfully applied on a real active-source seismic dataset.

  12. Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

    NASA Astrophysics Data System (ADS)

    Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

    2011-12-01

    Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the

  13. The Live Access Server Scientific Product Generation Through Workflow Orchestration

    NASA Astrophysics Data System (ADS)

    Hankin, S.; Calahan, J.; Li, J.; Manke, A.; O'Brien, K.; Schweitzer, R.

    2006-12-01

    The Live Access Server (LAS) is a well-established Web-application for display and analysis of geo-science data sets. The software, which can be downloaded and installed by anyone, gives data providers an easy way to establish services for their on-line data holdings, so their users can make plots; create and download data sub-sets; compare (difference) fields; and perform simple analyses. Now at version 7.0, LAS has been in operation since 1994. The current "Armstrong" release of LAS V7 consists of three components in a tiered architecture: user interface, workflow orchestration and Web Services. The LAS user interface (UI) communicates with the LAS Product Server via an XML protocol embedded in an HTTP "get" URL. Libraries (APIs) have been developed in Java, JavaScript and perl that can readily generate this URL. As a result of this flexibility it is common to find LAS user interfaces of radically different character, tailored to the nature of specific datasets or the mindset of specific users. When a request is received by the LAS Product Server (LPS -- the workflow orchestration component), business logic converts this request into a series of Web Service requests invoked via SOAP. These "back- end" Web services perform data access and generate products (visualizations, data subsets, analyses, etc.). LPS then packages these outputs into final products (typically HTML pages) via Jakarta Velocity templates for delivery to the end user. "Fine grained" data access is performed by back-end services that may utilize JDBC for data base access; the OPeNDAP "DAPPER" protocol; or (in principle) the OGC WFS protocol. Back-end visualization services are commonly legacy science applications wrapped in Java or Python (or perl) classes and deployed as Web Services accessible via SOAP. Ferret is the default visualization application used by LAS, though other applications such as Matlab, CDAT, and GrADS can also be used. Other back-end services may include generation of Google

  14. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    SciTech Connect

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-01-01

    The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.

  15. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    DOE PAGESBeta

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-01-01

    The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally,more » it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.« less

  16. Exploring Two Approaches for an End-to-End Scientific Analysis Workflow

    NASA Astrophysics Data System (ADS)

    Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba

    2015-12-01

    The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. We have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC); the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In this paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.

  17. A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid

    PubMed Central

    Tan, Wei; Missier, Paolo; Foster, Ian; Madduri, Ravi; Goble, Carole

    2009-01-01

    With the emergence of “service oriented science,” the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create “science workflows.” We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community. PMID:20625534

  18. Enhancing the Scientific Data Delivery, Workflow and Consumption

    NASA Astrophysics Data System (ADS)

    Shrestha, S. R.; Rosencrans, M.; Collow, T. W.; Ali, K.; Zimble, D. A.; Rose, B.

    2015-12-01

    To improve scientific data and products access, usability and interoperability, NOAA offices, like the Climate Prediction Center (CPC), exploring various geospatial solutions to serve their users. As NOAA scientists develop new solutions that drive the research and implementation to improve services, it is imperative that those research outcomes (data and products) can be consumed by customers and easily integrated into customer decision processes. As such, progress is being made to leverage an interoperable data platform wherein systems can integrate with each other to support the synthesis of Climate and Weather data. In this talk, we will share an ongoing use case at CPC, demonstrating how Esri technology is being implemented to improve scientific data access, manipulation, analysis, visualization and use.

  19. Scientific performance estimation of robustness and threat

    NASA Astrophysics Data System (ADS)

    Hoffman, John R.; Sorensen, Eric; Stelzig, Chad A.; Mahler, Ronald P. S.; El-Fallah, Adel I.; Alford, Mark G.

    2002-07-01

    For the last three years at this conference we have been describing the implementation of a unified, scientific approach to performance estimation for various aspects of data fusion: multitarget detection, tracking, and identification algorithms; sensor management algorithms; and adaptive data fusion algorithms. The proposed approach is based on finite-set statistics (FISST), a generalization of conventional statistics to multisource, multitarget problems. Finite-set statistics makes it possible to directly extend Shannon-type information metrics to multisource, multitarget problems in such a way that information can be defined and measured even though any given end-user may have conflicting or even subjective definitions of what informative means. In this presentation, we will show how to extend our previous results to two new problems. First, that of evaluating the robustness of multisensor, multitarget algorithms. Second, that of evaluating the performance of multisource-multitarget threat assessment algorithms.

  20. Chang'E-3 data pre-processing system based on scientific workflow

    NASA Astrophysics Data System (ADS)

    tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

    2016-04-01

    The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.

  1. A Run-time System for Efficient Execution of Scientific Workflows on Distributed Environments*

    PubMed Central

    Teodoro, George; Tavares, Tulio; Ferreira, Renato; Kurc, Tahsin; Meira, Wagner; Guedes, Dorgival; Pan, Tony; Saltz, Joel

    2012-01-01

    Scientific workflow systems have been introduced in response to the demand of researchers from several domains of science who need to process and analyze increasingly larger datasets. The design of these systems is largely based on the observation that data analysis applications can be composed as pipelines or networks of computations on data. In this work, we present a runtime support system that is designed to facilitate this type of computation in distributed computing environments. Our system is optimized for data-intensive workflows, in which efficient management and retrieval of data, coordination of data processing and data movement, and check-pointing of intermediate results are critical and challenging issues. Experimental evaluation of our system shows that linear speedups can be achieved for sophisticated applications, which are implemented as a network of multiple data processing components. PMID:22582009

  2. A Six‐Stage Workflow for Robust Application of Systems Pharmacology

    PubMed Central

    Gadkar, K; Kirouac, DC; Mager, DE; van der Graaf, PH

    2016-01-01

    Quantitative and systems pharmacology (QSP) is increasingly being applied in pharmaceutical research and development. One factor critical to the ultimate success of QSP is the establishment of commonly accepted language, technical criteria, and workflows. We propose an integrated workflow that bridges conceptual objectives with underlying technical detail to support the execution, communication, and evaluation of QSP projects. PMID:27299936

  3. An open source workflow for 3D printouts of scientific data volumes

    NASA Astrophysics Data System (ADS)

    Loewe, P.; Klump, J. F.; Wickert, J.; Ludwig, M.; Frigeri, A.

    2013-12-01

    As the amount of scientific data continues to grow, researchers need new tools to help them visualize complex data. Immersive data-visualisations are helpful, yet fail to provide tactile feedback and sensory feedback on spatial orientation, as provided from tangible objects. The gap in sensory feedback from virtual objects leads to the development of tangible representations of geospatial information to solve real world problems. Examples are animated globes [1], interactive environments like tangible GIS [2], and on demand 3D prints. The production of a tangible representation of a scientific data set is one step in a line of scientific thinking, leading from the physical world into scientific reasoning and back: The process starts with a physical observation, or from a data stream generated by an environmental sensor. This data stream is turned into a geo-referenced data set. This data is turned into a volume representation which is converted into command sequences for the printing device, leading to the creation of a 3D printout. As a last, but crucial step, this new object has to be documented and linked to the associated metadata, and curated in long term repositories to preserve its scientific meaning and context. The workflow to produce tangible 3D data-prints from science data at the German Research Centre for Geosciences (GFZ) was implemented as a software based on the Free and Open Source Geoinformatics tools GRASS GIS and Paraview. The workflow was successfully validated in various application scenarios at GFZ using a RapMan printer to create 3D specimens of elevation models, geological underground models, ice penetrating radar soundings for planetology, and space time stacks for Tsunami model quality assessment. While these first pilot applications have demonstrated the feasibility of the overall approach [3], current research focuses on the provision of the workflow as Software as a Service (SAAS), thematic generalisation of information content and

  4. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    SciTech Connect

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-01-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  5. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-06-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  6. The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

    NASA Astrophysics Data System (ADS)

    Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

    2012-12-01

    The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models

  7. Nationwide Buildings Energy Research enabled through an integrated Data Intensive Scientific Workflow and Advanced Analysis Environment

    SciTech Connect

    Kleese van Dam, Kerstin; Lansing, Carina S.; Elsethagen, Todd O.; Hathaway, John E.; Guillen, Zoe C.; Dirks, James A.; Skorski, Daniel C.; Stephan, Eric G.; Gorrissen, Willy J.; Gorton, Ian; Liu, Yan

    2014-01-28

    Modern workflow systems enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve, due to the inherent resource requirements needed for the modeling work. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we will describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create and execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The integration of simulation and analysis environments is hereby not only a question of ease of use, but supports fundamental functions in the correlated analysis of simulation input, execution details and derived results for multi-variant, complex studies. To this end the team extended and integrated the existing capabilities of the Velo data management and analysis infrastructure, the MeDICi data intensive workflow system and RHIPE the R for Hadoop version of the well-known statistics package, as well as developing a new visual analytics interface for the result exploitation by multi-domain users. The capabilities of the new environment are demonstrated on a use case that focusses on the Pacific Northwest National Laboratory (PNNL) building energy team, showing how they were able to take their previously local scale simulations to a nationwide level by utilizing data intensive computing techniques not only for their modeling work, but also for the subsequent analysis of their modeling results. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3 year study of building energy demands for the US Eastern

  8. LiSIs: An Online Scientific Workflow System for Virtual Screening.

    PubMed

    Kannas, Christos C; Kalvari, Ioanna; Lambrinidis, George; Neophytou, Christiana M; Savva, Christiana G; Kirmitzoglou, Ioannis; Antoniou, Zinonas; Achilleos, Kleo G; Scherf, David; Pitta, Chara A; Nicolaou, Christos A; Mikros, Emanuel; Promponas, Vasilis J; Gerhauser, Clarissa; Mehta, Rajendra G; Constantinou, Andreas I; Pattichis, Constantinos S

    2015-01-01

    Modern methods of drug discovery and development in recent years make a wide use of computational algorithms. These methods utilise Virtual Screening (VS), which is the computational counterpart of experimental screening. In this manner the in silico models and tools initial replace the wet lab methods saving time and resources. This paper presents the overall design and implementation of a web based scientific workflow system for virtual screening called, the Life Sciences Informatics (LiSIs) platform. The LiSIs platform consists of the following layers: the input layer covering the data file input; the pre-processing layer covering the descriptors calculation, and the docking preparation components; the processing layer covering the attribute filtering, compound similarity, substructure matching, docking prediction, predictive modelling and molecular clustering; post-processing layer covering the output reformatting and binary file merging components; output layer covering the storage component. The potential of LiSIs platform has been demonstrated through two case studies designed to illustrate the preparation of tools for the identification of promising chemical structures. The first case study involved the development of a Quantitative Structure Activity Relationship (QSAR) model on a literature dataset while the second case study implemented a docking-based virtual screening experiment. Our results show that VS workflows utilizing docking, predictive models and other in silico tools as implemented in the LiSIs platform can identify compounds in line with expert expectations. We anticipate that the deployment of LiSIs, as currently implemented and available for use, can enable drug discovery researchers to more easily use state of the art computational techniques in their search for promising chemical compounds. The LiSIs platform is freely accessible (i) under the GRANATUM platform at: http://www.granatum.org and (ii) directly at: http

  9. The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

    PubMed

    Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

    2016-06-01

    This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.

  10. The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

    PubMed

    Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

    2016-06-01

    This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums. PMID:26170051

  11. A Practitioner Friendly and Scientifically Robust Training Evaluation Approach

    ERIC Educational Resources Information Center

    Griffin, Richard

    2012-01-01

    Purpose: This article seeks to review the current state of workplace learning evaluation, to set out the rationale for evaluation along with the barriers that practitioners face when seeking to assess the effectiveness of training and development. Finally, it aims to propose a scientifically robust and practitioner friendly approach to evaluation.…

  12. A framework for integration of scientific applications into the OpenTopography workflow

    NASA Astrophysics Data System (ADS)

    Nandigam, V.; Crosby, C.; Baru, C.

    2012-12-01

    The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by

  13. ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows

    NASA Technical Reports Server (NTRS)

    McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush

    2004-01-01

    With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.

  14. Facilitating Scientific Research through Workflows and Provenance on the DataONE Cyberinfrastructure (Invited)

    NASA Astrophysics Data System (ADS)

    Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

    2013-12-01

    Provenance data has numerous applications in science. Two key ones are 1) replication: facilitate the repeatable derivation of results and 2) discovery: enable the location of data based on processing history and derivation relationships. The following scenario illustrates a typical use of provenance data. Alice, a climate scientist, has developed a VisTrails workflow to prepare Gross Primary Productivity (GPP) data. After verifying that the workflow generates data in the desired form, she uses the ReproZip tool to create a reproducible package that will enable other scientists to re-run the workflow without having to install and configure the particular libraries she is using. In addition, she exports the provenance information of the workflow execution and customizes it through a tool such as the ProvExplorer, in order to eliminate the information she regards as superfluous. She then creates and shares a DataONE data package containing the data she prepared, the ReproZip package, the customized provenance, and additional science/system metadata. Both the customized provenance and metadata are indexed by the DataONE Cyberinfrastructure (CI) for discovery purposes. Bob, another climate scientist, is looking for a benchmark GPP data to validate the Terrestrial Biosphere Model (TBM) he has developed. Searching the DataONE repository he finds Alice's data package. He retrieves its ReproZip package, customizes it (e.g. changing the spatial resolution), and re-runs it to generate the benchmark data in the form he desires. The newly generated data is then used as input for his own model evaluation workflow. His workflow generates residual maps and a Taylor diagram that enable him to evaluate the similarity between the results of his model and the benchmark data. At this point, Bob can also make use of the tools Alice used to publish his results as another discoverable and reproducible data package. In order to support these capabilities, we propose to extend the Data

  15. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    NASA Astrophysics Data System (ADS)

    Friedrich, Jon M.

    2013-05-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish several scientific objectives. Students take part in a classroom-based, distributed workflow project that aims to produce high-quality data on the physical dimensions of chondrules, mm-sized spherules contained in primitive meteorites called chondrites. Such data are needed to test astrophysical models for processes acting in the early solar system. Student investigators process X-ray microtomography data with resources contained on portable USB flash drives distributed to them. Students are exposed to data collection, data quality evaluation, interpretation, and presentation of their results. Herein, an introduction to the scientific objectives is given along with an evolutionary history of the project. A description of the current implementation of the course is presented, and future directions are discussed. Anonymous student evaluations of the course are used to demonstrate the educational and engaging nature of the project. Finally, we reflect on the possible benefits of such a project for first- and second-year students within STEM disciplines.

  16. Processes in scientific workflows for information seeking related to physical sample materials

    NASA Astrophysics Data System (ADS)

    Ramdeen, S.

    2014-12-01

    The majority of State Geological Surveys have repositories containing cores, cuttings, fossils or other physical sample material. State surveys maintain these collections to support their own research as well as the research conducted by external users from other organizations. This includes organizations such as government agencies (state and federal), academia, industry and the public. The preliminary results presented in this paper will look at the research processes of these external users. In particular: how they discover, access and use digital surrogates, which they use to evaluate and access physical items in these collections. Data such as physical samples are materials that cannot be completely replaced with digital surrogates. Digital surrogates may be represented as metadata, which enable discovery and ultimately access to these samples. These surrogates may be found in records, databases, publications, etc. But surrogates do not completely prevent the need for access to the physical item as they cannot be subjected to chemical testing and/or other similar analysis. The goal of this research is to document the various processes external users perform in order to access physical materials. Data for this study will be collected by conducting interviews with these external users. During the interviews, participants will be asked to describe the workflow that lead them to interact with state survey repositories, and what steps they took afterward. High level processes/categories of behavior will be identified. These processes will be used in the development of an information seeking behavior model. This model may be used to facilitate the development of management tools and other aspects of cyberinfrastructure related to physical samples.

  17. The application of cloud computing to scientific workflows: a study of cost and performance.

    PubMed

    Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S

    2013-01-28

    The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.

  18. Scientific workflow and support for high resolution global climate modeling at the Oak Ridge Leadership Computing Facility

    NASA Astrophysics Data System (ADS)

    Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.

    2012-04-01

    The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on

  19. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  20. Correctness issues in workflow management

    NASA Astrophysics Data System (ADS)

    Kamath, Mohan; Ramamritham, Krithi

    1996-12-01

    Workflow management is a technique to integrate and automate the execution of steps that comprise a complex process, e.g., a business process. Workflow management systems (WFMSs) primarily evolved from industry to cater to the growing demand for office automation tools among businesses. Coincidentally, database researchers developed several extended transaction models to handle similar applications. Although the goals of both the communities were the same, the issues they focused on were different. The workflow community primarily focused on modelling aspects to accurately capture the data and control flow requirements between the steps that comprise a workflow, while the database community focused on correctness aspects to ensure data consistency of sub-transactions that comprise a transaction. However, we now see a confluence of some of the ideas, with additional features being gradually offered by WFMSs. This paper provides an overview of correctness in workflow management. Correctness is an important aspect of WFMSs and a proper understanding of the available concepts and techniques by WFMS developers and workflow designers will help in building workflows that are flexible enough to capture the requirements of real world applications and robust enough to provide the necessary correctness and reliability properties. We first enumerate the correctness issues that have to be considered to ensure data consistency. Then we survey techniques that have been proposed or are being used in WFMSs for ensuring correctness of workflows. These techniques emerge from the areas of workflow management, extended transaction models, multidatabases and transactional workflows. Finally, we present some open issues related to correctness of workflows in the presence of concurrency and failures.

  1. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

    PubMed

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.

  2. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

    PubMed

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources. PMID:22493575

  3. Robustness

    NASA Technical Reports Server (NTRS)

    Ryan, R.

    1993-01-01

    Robustness is a buzz word common to all newly proposed space systems design as well as many new commercial products. The image that one conjures up when the word appears is a 'Paul Bunyon' (lumberjack design), strong and hearty; healthy with margins in all aspects of the design. In actuality, robustness is much broader in scope than margins, including such factors as simplicity, redundancy, desensitization to parameter variations, control of parameter variations (environments flucation), and operational approaches. These must be traded with concepts, materials, and fabrication approaches against the criteria of performance, cost, and reliability. This includes manufacturing, assembly, processing, checkout, and operations. The design engineer or project chief is faced with finding ways and means to inculcate robustness into an operational design. First, however, be sure he understands the definition and goals of robustness. This paper will deal with these issues as well as the need for the requirement for robustness.

  4. A Classroom-Based Distributed Workflow Initiative for the Early Involvement of Undergraduate Students in Scientific Research

    ERIC Educational Resources Information Center

    Friedrich, Jon M.

    2014-01-01

    Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…

  5. Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

    SciTech Connect

    Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.; Schuchardt, Karen L.; Ngu, Anne Hee Hiong

    2011-07-04

    A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adapted by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.

  6. Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

    PubMed

    Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.

  7. Implementing bioinformatic workflows within the bioextract server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  8. VO-compliant workflows and science gateways

    NASA Astrophysics Data System (ADS)

    Castelli, G.; Taffoni, G.; Sciacca, E.; Becciani, U.; Costa, A.; Krokos, M.; Pasian, F.; Vuerli, C.

    2015-06-01

    Workflow and science gateway technologies have been adopted by scientific communities as a valuable tool to carry out complex experiments. They offer the possibility to perform computations for data analysis and simulations, whereas hiding details of the complex infrastructures underneath. There are many workflow management systems covering a large variety of generic services coordinating execution of workflows. In this paper we describe our experiences in creating workflows oriented science gateways based on gUSE/WS-PGRADE technology and in particular we discuss the efforts devoted to develop a VO-compliant web environment.

  9. A framework for streamlining research workflow in neuroscience and psychology

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691

  10. SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

    NASA Astrophysics Data System (ADS)

    Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

    2014-05-01

    Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely

  11. Lattice QCD workflows

    SciTech Connect

    Piccoli, Luciano; Kowalkowski, James B.; Simone, James N.; Sun, Xian-He; Jin, Hui; Holmgren, Donald J.; Seenu, Nirmal; Singh, Amitoj G.; /Fermilab

    2008-12-01

    This paper discusses the application of existing workflow management systems to a real world science application (LQCD). Typical workflows and execution environment used in production are described. Requirements for the LQCD production system are discussed. The workflow management systems Askalon and Swift were tested by implementing the LQCD workflows and evaluated against the requirements. We report our findings and future work.

  12. Workflow automation architecture standard

    SciTech Connect

    Moshofsky, R.P.; Rohen, W.T.

    1994-11-14

    This document presents an architectural standard for application of workflow automation technology. The standard includes a functional architecture, process for developing an automated workflow system for a work group, functional and collateral specifications for workflow automation, and results of a proof of concept prototype.

  13. Metaworkflows and Workflow Interoperability for Heliophysics

    NASA Astrophysics Data System (ADS)

    Pierantoni, Gabriele; Carley, Eoin P.

    2014-06-01

    Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They

  14. Workflow in Astronomy : the VO France Workflow Working Group experience

    NASA Astrophysics Data System (ADS)

    Schaaff, A.; Petit, F. L.; Prugniel, P.; Slezak, E.; Surace, C.

    2008-08-01

    The French Action Spécifique Observatoires Virtuels has created the Workflow Working Group in 2005. Its aim is to explore the use of the Workflow paradigm in the astronomical domain. The first consensus was the definition of a Workflow as a sequence of tasks realized in a controlled context (at various levels: intelligence in the choice of the algorithms, flow control, etc.), based on use cases studies, in an architecture which takes into account VO standards. The current roadmap is to provide scientific use cases in several domains (image, spectrum, simulation, data mining, etc.) and to improve them mainly with VO existing tools. Another important point is to develop collaborations with the IT community (links to EGEE, ...). Use cases are useful to compare the pertinence of the possible workflow models and to understand how to implement it as efficiently as possible with the existing tools (ex. : AstroGrid, AÏDA, WebCom-G, etc.). The execution (local machine, cluster, grid) through this kind of tools and the use of VO functionalities (Web Services, Grid, VOSpace, etc.) becomes almost transparent.

  15. Automatically detecting workflows in PubChem.

    PubMed

    Calhoun, Bradley T; Browning, Michael R; Chen, Brian R; Bittker, Joshua A; Swamidass, S Joshua

    2012-09-01

    Public databases that store the data from small-molecule screens are a rich and untapped resource of chemical and biological information. However, screening databases are unorganized, which makes interpreting their data difficult. We propose a method of inferring workflow graphs--which encode the relationships between assays in screening projects--directly from screening data and using these workflows to organize each project's data. On the basis of four heuristics regarding the organization of screening projects, we designed an algorithm that extracts a project's workflow graph from screening data. Where possible, the algorithm is evaluated by comparing each project's inferred workflow to its documentation. In the majority of cases, there are no discrepancies between the two. Most errors can be traced to points in the project where screeners chose additional molecules to test based on structural similarity to promising molecules, a case our algorithm is not yet capable of handling. Nonetheless, these workflows accurately organize most of the data and also provide a method of visualizing a screening project. This method is robust enough to build a workflow-oriented front-end to PubChem and is currently being used regularly by both our lab and our collaborators. A Python implementation of the algorithm is available online, and a searchable database of all PubChem workflows is available at http://swami.wustl.edu/flow.

  16. Dynamic reusable workflows for ocean science

    USGS Publications Warehouse

    Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

    2016-01-01

    Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic

  17. Dynamic Reusable Workflows for Ocean Science

    USGS Publications Warehouse

    Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

    2016-01-01

    Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic

  18. Flexible workflow sharing and execution services for e-scientists

    NASA Astrophysics Data System (ADS)

    Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

    2013-04-01

    The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already

  19. Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012

    SciTech Connect

    Ludascher, Bertram; Altintas, Ilkay

    2013-09-06

    Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.

  20. Developing a workflow to identify inconsistencies in volunteered geographic information: a phenological case study

    USGS Publications Warehouse

    Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.

    2015-01-01

    assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.

  1. BReW: Blackbox Resource Selection for e-Science Workflows

    SciTech Connect

    Simmhan, Yogesh; Soroush, Emad; Van Ingen, Catharine; Agarwal, Deb; Ramakrishnan, Lavanya

    2010-10-04

    Workflows are commonly used to model data intensive scientific analysis. As computational resource needs increase for eScience, emerging platforms like clouds present additional resource choices for scientists and policy makers. We introduce BReW, a tool enables users to make rapid, highlevel platform selection for their workflows using limited workflow knowledge. This helps make informed decisions on whether to port a workflow to a new platform. Our analysis of synthetic and real eScience workflows shows that using just total runtime length, maximum task fanout, and total data used and produced by the workflow, BReW can provide platform predictions comparable to whitebox models with detailed workflow knowledge.

  2. Benchmarking ETL Workflows

    NASA Astrophysics Data System (ADS)

    Simitsis, Alkis; Vassiliadis, Panos; Dayal, Umeshwar; Karagiannis, Anastasios; Tziovara, Vasiliki

    Extraction-Transform-Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. A plethora of ETL tools is currently available constituting a multi-million dollar market. Each ETL tool uses its own technique for the design and implementation of an ETL workflow, making the task of assessing ETL tools extremely difficult. In this paper, we identify common characteristics of ETL workflows in an effort of proposing a unified evaluation method for ETL. We also identify the main points of interest in designing, implementing, and maintaining ETL workflows. Finally, we propose a principled organization of test suites based on the TPC-H schema for the problem of experimenting with ETL workflows.

  3. Towards Composing Data Aware Systems Biology Workflows on Cloud Platforms: A MeDICi-based Approach

    SciTech Connect

    Gorton, Ian; Liu, Yan; Yin, Jian; Kulkarni, Anand V.; Wynne, Adam S.

    2011-09-08

    Cloud computing is being increasingly adopted for deploying systems biology scientific workflows. Scientists developing these workflows use a wide variety of fragmented and competing data sets and computational tools of all scales to support their research. To this end, the synergy of client side workflow tools with cloud platforms is a promising approach to share and reuse data and workflows. In such systems, the location of data and computation is essential consideration in terms of quality of service for composing a scientific workflow across remote cloud platforms. In this paper, we describe a cloud-based workflow for genome annotation processing that is underpinned by MeDICi - a middleware designed for data intensive scientific applications. The workflow implementation incorporates an execution layer for exploiting data locality that routes the workflow requests to the processing steps that are colocated with the data. We demonstrate our approach by composing two workflowswith the MeDICi pipelines.

  4. Workflows in a secure environment

    SciTech Connect

    Klasky, Scott A; Podhorszki, Norbert

    2008-01-01

    Petascale simulations on the largest supercomputers in the US require advanced data management techniques in order to optimize the application scien- tist time, and to optimize the time spent on the supercomputers. Researchers in such problems are starting to require workflow automation during their simula- tions in order to monitor the simulations, and in order to automate many of the complex analysis which must take place from the data that is generated from these simulations. Scientific workflows are being used to monitor simulations running on these supercomputers by applying a series of complex analysis, and finally producing images and movies from the variables produced in the simulation, or from the derived quantities produced by the analysis. The typical scenario is where the large calculation runs on the supercomputer, and the auxiliary diagnos- tics/monitors are run on resources, which are either on the local area network of the supercomputer, or over the wide area network. The supercomputers at one of the largest centers are highly secure, and the only method to log into the center is interactive authentication by using One Time Passwords (OTP) that are generated by a security device and expire in half a minute. Therefore, grid certificates are not a current option on these machines in the Department of Energy at Oak Ridge Na- tional Laboratory. In this paper we describe how we have extended the Kepler sci- entific workflow management system to be able to run operations on these supercomputers, how workflows themselves can be executed as batch jobs, and fi- nally, how external data-transfer operations can be utilized when they need to per- form authentication for their own as well.

  5. Deployment of precise and robust sensors on board ISS-for scientific experiments and for operation of the station.

    PubMed

    Stenzel, Christian

    2016-09-01

    The International Space Station (ISS) is the largest technical vehicle ever built by mankind. It provides a living area for six astronauts and also represents a laboratory in which scientific experiments are conducted in an extraordinary environment. The deployed sensor technology contributes significantly to the operational and scientific success of the station. The sensors on board the ISS can be thereby classified into two categories which differ significantly in their key features: (1) sensors related to crew and station health, and (2) sensors to provide specific measurements in research facilities. The operation of the station requires robust, long-term stable and reliable sensors, since they assure the survival of the astronauts and the intactness of the station. Recently, a wireless sensor network for measuring environmental parameters like temperature, pressure, and humidity was established and its function could be successfully verified over several months. Such a network enhances the operational reliability and stability for monitoring these critical parameters compared to single sensors. The sensors which are implemented into the research facilities have to fulfil other objectives. The high performance of the scientific experiments that are conducted in different research facilities on-board demands the perfect embedding of the sensor in the respective instrumental setup which forms the complete measurement chain. It is shown that the performance of the single sensor alone does not determine the success of the measurement task; moreover, the synergy between different sensors and actuators as well as appropriate sample taking, followed by an appropriate sample preparation play an essential role. The application in a space environment adds additional challenges to the sensor technology, for example the necessity for miniaturisation, automation, reliability, and long-term operation. An alternative is the repetitive calibration of the sensors. This approach

  6. Deployment of precise and robust sensors on board ISS-for scientific experiments and for operation of the station.

    PubMed

    Stenzel, Christian

    2016-09-01

    The International Space Station (ISS) is the largest technical vehicle ever built by mankind. It provides a living area for six astronauts and also represents a laboratory in which scientific experiments are conducted in an extraordinary environment. The deployed sensor technology contributes significantly to the operational and scientific success of the station. The sensors on board the ISS can be thereby classified into two categories which differ significantly in their key features: (1) sensors related to crew and station health, and (2) sensors to provide specific measurements in research facilities. The operation of the station requires robust, long-term stable and reliable sensors, since they assure the survival of the astronauts and the intactness of the station. Recently, a wireless sensor network for measuring environmental parameters like temperature, pressure, and humidity was established and its function could be successfully verified over several months. Such a network enhances the operational reliability and stability for monitoring these critical parameters compared to single sensors. The sensors which are implemented into the research facilities have to fulfil other objectives. The high performance of the scientific experiments that are conducted in different research facilities on-board demands the perfect embedding of the sensor in the respective instrumental setup which forms the complete measurement chain. It is shown that the performance of the single sensor alone does not determine the success of the measurement task; moreover, the synergy between different sensors and actuators as well as appropriate sample taking, followed by an appropriate sample preparation play an essential role. The application in a space environment adds additional challenges to the sensor technology, for example the necessity for miniaturisation, automation, reliability, and long-term operation. An alternative is the repetitive calibration of the sensors. This approach

  7. Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

    PubMed Central

    Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

  8. Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

    PubMed

    Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

  9. An Interoperable GridWorkflow Management System

    NASA Astrophysics Data System (ADS)

    Mirto, Maria; Passante, Marco; Epicoco, Italo; Aloisio, Giovanni

    A WorkFlow Management System (WFMS) is a fundamental componentenabling to integrate data, applications and a wide set of project resources. Although a number of scientific WFMSs support this task, many analysis pipelines require large-scale Grid computing infrastructures to cope with their high compute and storage requirements. Such scientific workflows complicate the management of resources, especially in cases where they are offered by several resource providers, managed by different Grid middleware, since resource access must be synchronised in advance to allow reliable workflow execution. Different types of Grid middleware such as gLite, Unicore and Globus are used around the world and may cause interoperability issues if applications involve two or more of them. In this paperwe describe the ProGenGrid Workflow Management System which the main goal is to provide interoperability among these different grid middleware when executing workflows. It allows the composition of batch; parameter sweep and MPI based jobs. The ProGenGrid engine implements the logic to execute such jobs by using a standard language OGF compliant such as JSDL that has been extended for this purpose. Currently, we are testing our system on some bioinformatics case studies in the International Laboratory of Bioinformatics (LIBI) Project (www.libi.it).

  10. Digital work-flow

    PubMed Central

    MARSANGO, V.; BOLLERO, R.; D’OVIDIO, N.; MIRANDA, M.; BOLLERO, P.; BARLATTANI, A.

    2014-01-01

    SUMMARY Objective. The project presents a clinical case in which the digital work-flow procedure was applied for a prosthetic rehabilitation in natural teeth and implants. Materials. Digital work-flow uses patient’s photo for the aesthetic’s planning, digital smile technology for the simulation of the final restoration and real time scanning to register the two arches. Than the scanning are sent to the laboratory that proceed with CAD-CAM production. Results. Digital work-flow offers the opportunities to easily speak with laboratory and patients, gives better clinical results and demonstrated to be a less invasiveness method for the patient. Conclusion. Intra-oral scanner, digital smile design, preview using digital wax-up, CAD-CAM production, are new predictable opportunities for prosthetic team. This work-flow, compared with traditional methods, is faster, more precise and predictable. PMID:25694797

  11. Reporting workflow modeling

    NASA Astrophysics Data System (ADS)

    Noumeir, Rita

    2004-04-01

    The radiology diagnostic reporting is a process that results in generating a diagnostic report to be made available outside the radiology department. The report captures the radiologist"s interpretations and impressions. It is an element of the patient healthcare record and represents important clinical information to assist in healthcare decisions. The reporting process is initiated by the existence of images or other radiology evidences to be interpreted. The work of individuals is controlled by systems that manage workflow. These systems may introduce delays or constraints on how and when tasks are performed. In order to design and implement efficient information systems that manage the reporting workflow, an accurate workflow modeling is needed. Workflow modeling consists in describing what is done by whom and in what sequence, that is the roles, tasks and sequences of tasks. The workflow model is very important and has major consequences. An inaccurate model introduces inefficiencies, frustrations and may result in a useless information system. In this paper, we will model several common reporting workflows by describing the roles, tasks and information flows involved.

  12. GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation models

    NASA Astrophysics Data System (ADS)

    Fischer, T.; Naumov, D.; Sattler, S.; Kolditz, O.; Walther, M.

    2015-11-01

    We offer a versatile workflow to convert geological models built with the ParadigmTM GOCAD© (Geological Object Computer Aided Design) software into the open-source VTU (Visualization Toolkit unstructured grid) format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed as a way of communication between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of environmental studies. With two application examples in the Thuringian Syncline, we show how a heterogeneous geological GOCAD model including multiple layers and faults can be used for numerical groundwater flow modeling, in our case employing the OpenGeoSys open-source numerical toolbox for groundwater flow simulations. The presented workflow offers the chance to incorporate increasingly detailed data, utilizing the growing availability of computational power to simulate numerical models.

  13. Workflow management systems in radiology

    NASA Astrophysics Data System (ADS)

    Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim

    1998-07-01

    In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.

  14. Building a Robust 21st Century Chemical Testing Program at the U.S. Environmental Protection Agency: Recommendations for Strengthening Scientific Engagement

    PubMed Central

    Dantzker, Heather C.; Portier, Christopher J.

    2014-01-01

    Background: Biological pathway-based chemical testing approaches are central to the National Research Council’s vision for 21st century toxicity testing. Approaches such as high-throughput in vitro screening offer the potential to evaluate thousands of chemicals faster and cheaper than ever before and to reduce testing on laboratory animals. Collaborative scientific engagement is important in addressing scientific issues arising in new federal chemical testing programs and for achieving stakeholder support of their use. Objectives: We present two recommendations specifically focused on increasing scientific engagement in the U.S. Environmental Protection Agency (EPA) ToxCast™ initiative. Through these recommendations we seek to bolster the scientific foundation of federal chemical testing efforts such as ToxCast™ and the public health decisions that rely upon them. Discussion: Environmental Defense Fund works across disciplines and with diverse groups to improve the science underlying environmental health decisions. We propose that the U.S. EPA can strengthen the scientific foundation of its new chemical testing efforts and increase support for them in the scientific research community by a) expanding and diversifying scientific input into the development and application of new chemical testing methods through collaborative workshops, and b) seeking out mutually beneficial research partnerships. Conclusions: Our recommendations provide concrete actions for the U.S. EPA to increase and diversify engagement with the scientific research community in its ToxCast™ initiative. We believe that such engagement will help ensure that new chemical testing data are scientifically robust and that the U.S. EPA gains the support and acceptance needed to sustain new testing efforts to protect public health. Citation: McPartland J, Dantzker HC, Portier CJ. 2015. Building a robust 21st century chemical testing program at the U.S. Environmental Protection Agency

  15. A Drupal-Based Collaborative Framework for Science Workflows

    NASA Astrophysics Data System (ADS)

    Pinheiro da Silva, P.; Gandara, A.

    2010-12-01

    Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between

  16. Phonon Gas Model (PGM) workflow in the VLab Science Gateway

    NASA Astrophysics Data System (ADS)

    da Silveira, P.; Zhang, D.; Wentzcovitch, R. M.

    2013-12-01

    This contribution describes a scientific workflow for first principles computations of free energy of crystalline solids using the phonon gas model (PGM). This model was recently implemented as a hybrid method combining molecular dynamics and phonon normal mode analysis to extract temperature dependent phonon frequencies and life times beyond perturbation theory. This is a demanding high throughout workflow and is currently being implemented in VLab Cyberinfrastructure [da Silveira et al., 2008], which has recently been integrated to the XSEDE. First we review the underlying PGM, its practical implementation, and calculation requirements. We then describe the workflow management and its general method for handling actions. We illustrate the PGM application with a calculation of MgSiO3-perovskite's anharmonic phonons. We conclude with an outlook of workflows to compute other material's properties that will use the PGM workflow. Research supported by NSF award EAR-1019853.

  17. Insightful Workflow For Grid Computing

    SciTech Connect

    Dr. Charles Earl

    2008-10-09

    We developed a workflow adaptation and scheduling system for Grid workflow. The system currently interfaces with and uses the Karajan workflow system. We developed machine learning agents that provide the planner/scheduler with information needed to make decisions about when and how to replan. The Kubrick restructures workflow at runtime, making it unique among workflow scheduling systems. The existing Kubrick system provides a platform on which to integrate additional quality of service constraints and in which to explore the use of an ensemble of scheduling and planning algorithms. This will be the principle thrust of our Phase II work.

  18. Make Your Workflows Smarter

    NASA Technical Reports Server (NTRS)

    Jones, Corey; Kapatos, Dennis; Skradski, Cory

    2012-01-01

    Do you have workflows with many manual tasks that slow down your business? Or, do you scale back workflows because there are simply too many manual tasks? Basic workflow robots can automate some common tasks, but not everything. This presentation will show how advanced robots called "expression robots" can be set up to perform everything from simple tasks such as: moving, creating folders, renaming, changing or creating an attribute, and revising, to more complex tasks like: creating a pdf, or even launching a session of Creo Parametric and performing a specific modeling task. Expression robots are able to utilize the Java API and Info*Engine to do almost anything you can imagine! Best of all, these tools are supported by PTC and will work with later releases of Windchill. Limited knowledge of Java, Info*Engine, and XML are required. The attendee will learn what task expression robots are capable of performing. The attendee will learn what is involved in setting up an expression robot. The attendee will gain a basic understanding of simple Info*Engine tasks

  19. Provenance-Powered Automatic Workflow Generation and Composition

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

    2015-12-01

    In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.

  20. Reflex: Graphical workflow engine for data reduction

    NASA Astrophysics Data System (ADS)

    ESO Reflex development Team

    2014-01-01

    Reflex provides an easy and flexible way to reduce VLT/VLTI science data using the ESO pipelines. It allows graphically specifying the sequence in which the data reduction steps are executed, including conditional stops, loops and conditional branches. It eases inspection of the intermediate and final data products and allows repetition of selected processing steps to optimize the data reduction. The data organization necessary to reduce the data is built into the system and is fully automatic; advanced users can plug their own modules and steps into the data reduction sequence. Reflex supports the development of data reduction workflows based on the ESO Common Pipeline Library. Reflex is based on the concept of a scientific workflow, whereby the data reduction cascade is rendered graphically and data seamlessly flow from one processing step to the next. It is distributed with a number of complete test datasets so users can immediately start experimenting and familiarize themselves with the system.

  1. Domain-Specific Languages for Composing Signature Discovery Workflows

    SciTech Connect

    Jacob, Ferosh; Gray, Jeff; Wynne, Adam S.; Liu, Yan; Baker, Nathan A.

    2012-10-23

    Domain-agnostic signature discovery entails investigation across multiple scientific disciplines. The breadth and cross-disciplinary nature of this work requires that existing executables be integrated with new capabilities into workflows, representing a wide range of user tasks. An algorithm may be written in multiple programming languages for various hardware platforms, and so workflow composition requires integrating executables from any number of remote hosts. This raises an engineering issue on how to generate web service wrappers for these heterogeneous executables and to compose them into a scientific workflow environment (e.g., Taverna). In this paper, we introduce two simple Domain-Specific Languages (DSLs) to automate these processes. Our Service Description Language (SDL) describes key elements of a signature discovery service and automatically generates its implementation code. The Workflow Description Language (WDL) describes the pipeline of services and generates deployable artifacts for the Taverna workflow management system. We demonstrate our approach with a real-world workflow composed of services wrapping remote executables.

  2. Using Kepler for Tool Integration in Microarray Analysis Workflows

    PubMed Central

    Gan, Zhuohui; Stowe, Jennifer C.; Altintas, Ilkay; McCulloch, Andrew D.; Zambon, Alexander C.

    2015-01-01

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines. PMID:26605000

  3. Deploying and sharing U-Compare workflows as web services

    PubMed Central

    2013-01-01

    Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017

  4. Workflows for microarray data processing in the Kepler environment

    PubMed Central

    2012-01-01

    Background Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or

  5. Metadata Standards and Workflow Systems

    NASA Astrophysics Data System (ADS)

    Habermann, T.

    2012-12-01

    All modern workflow systems include mechanisms for recording inputs, outputs and processes. These descriptions can include details required to reproduce the workflows exactly and, in some cases, can include virtual images of the hardware and operating system. There are several on-going and emerging standards for representing these detailed workflows including the Open Provenance Model (OPM) and the W3C PROV. At the same time, ISO metadata standards include a simple provenance or lineage model that includes many important elements of workflows. The ISO model could play a critical role in sharing and discovering workflow information for collections and perhaps in recording some details in granules. In order for this goal to be reached, connections between the detailed standards and ISO must be understood and conventions for using them must be developed.

  6. LQCD workflow execution framework: Models, provenance and fault-tolerance

    NASA Astrophysics Data System (ADS)

    Piccoli, Luciano; Dubey, Abhishek; Simone, James N.; Kowalkowlski, James B.

    2010-04-01

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  7. Essential Grid Workflow Monitoring Elements

    SciTech Connect

    Gunter, Daniel K.; Jackson, Keith R.; Konerding, David E.; Lee,Jason R.; Tierney, Brian L.

    2005-07-01

    Troubleshooting Grid workflows is difficult. A typicalworkflow involves a large number of components networks, middleware,hosts, etc. that can fail. Even when monitoring data from all thesecomponents is accessible, it is hard to tell whether failures andanomalies in these components are related toa given workflow. For theGrid to be truly usable, much of this uncertainty must be elim- inated.We propose two new Grid monitoring elements, Grid workflow identifiersand consistent component lifecycle events, that will make Gridtroubleshooting easier, and thus make Grids more usable, by simplifyingthe correlation of Grid monitoring data with a particular Gridworkflow.

  8. On Nondeterministic Workflow Executions

    NASA Astrophysics Data System (ADS)

    Potapova, Alexandra; Su, Jianwen

    The ability to compose existing services to form new functionality is one of the most promising ideas enabled by SOA and the framework of (web) services. A composition or a workflow often involves services distributed over a network and possibly many organizations and administrative domains. Nondeterminism could occur in a composition in at least two ways. The first form is the result of modeling abstraction that hides the detail information and thus makes the "computation" appear non-deterministic. The second form is closely related to "operational optimization", e.g., one may try to invoke more than multiple services for a task, whichever completes first will produce the result and preempts all other services. In this paper, we focus on the latter and measure the complexity of service execution as the amount of needed resources and controlling mechanism for executing nondeterministic service compositions. We formalize the model and complexity problem and develop technical results for this problem in the general setting as well as special cases.

  9. Domain-Specific Languages For Developing and Deploying Signature Discovery Workflows

    SciTech Connect

    Jacob, Ferosh; Wynne, Adam S.; Liu, Yan; Gray, Jeff

    2013-12-02

    Domain-agnostic Signature Discovery entails scientific investigation across multiple domains through the re-use of existing algorithms into workflows. The existing algorithms may be written in any programming language for various hardware architectures (e.g., desktops, commodity clusters, and specialized parallel hardware platforms). This raises an engineering issue in generating Web services for heterogeneous algorithms so that they can be composed into a scientific workflow environment (e.g., Taverna). In this paper, we present our software tool that defines two simple Domain-Specific Languages (DSLs) to automate these processes: SDL and WDL. Our Service Description Language (SDL) describes key elements of a signature discovery algorithm and generates the service code. The Workflow Description Language (WDL) describes the pipeline of services and generates deployable artifacts for the Taverna workflow management system. We demonstrate our tool with a landscape classification example that is represented by BLAST workflows composed of services that wrap original scripts.

  10. Scientific Process Automation Improves Data Interaction

    SciTech Connect

    Critchlow, Terence J.

    2009-09-30

    This is an article written for the September 09 Scientific Computing magazine about the work of the Scientific Process Automation team of The U.S. Department of Energy (DOE) Scientific Discovery through Advanced Computing (SciDAC) program. The SPA team is focused on developing and deploying automated workflows for a variety of computational science domains. Scientific workflows are the formalization of a scientific process that is frequently and repetitively performed.

  11. EPiK-a Workflow for Electron Tomography in Kepler*

    PubMed Central

    Wang, Jianwu; Crawl, Daniel; Phan, Sébastien; Lawrence, Albert; Ellisman, Mark

    2015-01-01

    Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility. PMID:25621086

  12. Drug discovery FAQs: workflows for answering multidomain drug discovery questions.

    PubMed

    Chichester, Christine; Digles, Daniela; Siebes, Ronald; Loizou, Antonis; Groth, Paul; Harland, Lee

    2015-04-01

    Modern data-driven drug discovery requires integrated resources to support decision-making and enable new discoveries. The Open PHACTS Discovery Platform (http://dev.openphacts.org) was built to address this requirement by focusing on drug discovery questions that are of high priority to the pharmaceutical industry. Although complex, most of these frequently asked questions (FAQs) revolve around the combination of data concerning compounds, targets, pathways and diseases. Computational drug discovery using workflow tools and the integrated resources of Open PHACTS can deliver answers to most of these questions. Here, we report on a selection of workflows used for solving these use cases and discuss some of the research challenges. The workflows are accessible online from myExperiment (http://www.myexperiment.org) and are available for reuse by the scientific community.

  13. It's All About the Data: Workflow Systems and Weather

    NASA Astrophysics Data System (ADS)

    Plale, B.

    2009-05-01

    Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible

  14. Astronomical Data Reduction Workflows with Reflex

    NASA Astrophysics Data System (ADS)

    Ballester, P.; Bramich, D.; Forchi, V.; Freudling, W.; Garcia-Dabó, C. E.; klein Gebbinck, M.; Modigliani, A.; Moehler, S.; Romaniello, M.

    2014-05-01

    Reflex (http://www.eso.org/reflex) is an environment that provides an easy and flexible way to reduce VLT/VLTI science data using the ESO. Its top-level functionalities are: (1) Reflex allows to graphically specify the sequence in which the data reduction steps are executed, including conditional stops, loops and conditional branches, (2) Reflex makes it easy to inspect the intermediate and final data products and to repeat selected processing steps to optimize the data reduction, (3) the data organization necessary to reduce the data is built into the system and is fully automatic, (4) advanced users can plug-in their own Python or IDL modules and steps into the data reduction sequence, and (5) Reflex supports the development of data reduction workflows based on the ESO Common Pipeline Library. Reflex is based on the concept of a scientific workflow, whereby the data reduction cascade is rendered graphically and data seamlessly flow from one processing step to the next. It is distributed with a number of complete test datasets so that users can immediately start experimenting and familiarize themselves with the system (http://www.eso.org/pipelines). In this demo, we present the latest version of Reflex and its applications for astronomical data reduction processes.

  15. Scalable Analysis of Distributed Workflow Traces

    SciTech Connect

    Gunter, Daniel K.; Tierney, Brian L.; Bailey, Stephen J.

    2005-06-01

    Bacterial response to nitric oxide (NO) is of major importance since NO is an obligatory intermediate of the nitrogen cycle. Transcriptional regulation of the dissimilatory nitric oxides metabolism in bacteria is Large-scale workflows are becoming increasingly important in both the scientific research and business domains. Science and commerce have both experienced an explosion in the sheer amount of data that must be analyzed. An important tool for analyzing these huge datasets is a compute cluster of hundreds or thousands of machines. However, debugging and tuning clusters requires specialized tools. Current cluster performance tools are more oriented towards tightly coupled parallel applications. We describe how the NetLogger Toolkit methodology is more appropriate for this class of cluster computing, and describe our new automatic work flow anomaly detection component. We also describe how this methodology is being used in the Nearby Supernova Factory (SN factory) project at Lawrence Berkeley National Laboratory.

  16. Introducing students to digital geological mapping: A workflow based on cheap hardware and free software

    NASA Astrophysics Data System (ADS)

    Vrabec, Marko; Dolžan, Erazem

    2016-04-01

    The undergraduate field course in Geological Mapping at the University of Ljubljana involves 20-40 students per year, which precludes the use of specialized rugged digital field equipment as the costs would be way beyond the capabilities of the Department. A different mapping area is selected each year with the aim to provide typical conditions that a professional geologist might encounter when doing fieldwork in Slovenia, which includes rugged relief, dense tree cover, and moderately-well- to poorly-exposed bedrock due to vegetation and urbanization. It is therefore mandatory that the digital tools and workflows are combined with classical methods of fieldwork, since, for example, full-time precise GNSS positioning is not viable under such circumstances. Additionally, due to the prevailing combination of complex geological structure with generally poor exposure, students cannot be expected to produce line (vector) maps of geological contacts on the go, so there is no need for such functionality in hardware and software that we use in the field. Our workflow therefore still relies on paper base maps, but is strongly complemented with digital tools to provide robust positioning, track recording, and acquisition of various point-based data. Primary field hardware are students' Android-based smartphones and optionally tablets. For our purposes, the built-in GNSS chips provide adequate positioning precision most of the time, particularly if they are GLONASS-capable. We use Oruxmaps, a powerful free offline map viewer for the Android platform, which facilitates the use of custom-made geopositioned maps. For digital base maps, which we prepare in free Windows QGIS software, we use scanned topographic maps provided by the National Geodetic Authority, but also other maps such as aerial imagery, processed Digital Elevation Models, scans of existing geological maps, etc. Point data, like important outcrop locations or structural measurements, are entered into Oruxmaps as

  17. The equivalency between logic Petri workflow nets and workflow nets.

    PubMed

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845

  18. The Equivalency between Logic Petri Workflow Nets and Workflow Nets

    PubMed Central

    Wang, Jing; Yu, ShuXia; Du, YuYue

    2015-01-01

    Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845

  19. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-05-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be

  20. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    PubMed Central

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  1. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    PubMed

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  2. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    PubMed

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  3. Streamlining workflow using existing technology.

    PubMed

    Corkery, Terry S

    2007-01-01

    Processing rehabilitation admissions and case management records in a three-person office in a major academic medical center had become cumbersome and redundant due to multiple information management approaches and requirements from various sources. Simple questionnaires and brief, casual meetings with pertinent personnel defined what was working well and what was problematic and helped establish a foundation for change management. Analysis of the existing paper system revealed more than 300 data items used more than once throughout the departmental processes. A simple timing trial, based on selected segments of a workflow diagram, revealed the potential to save 3 to 3(1/2) hours per case by revising a departmental database, decreasing work redundancy, and creating an electronic case file. Because the work environment utilized Microsoft Office and Access databases, a plan was developed to utilize these resources to streamline the workflow and eliminate duplication of effort in the admission/case management documentation processes.

  4. AstroTaverna-Building workflows with Virtual Observatory services

    NASA Astrophysics Data System (ADS)

    Ruiz, J. E.; Garrido, J.; Santander-Vela, J. D.; Sánchez-Expósito, S.; Verdes-Montenegro, L.

    2014-11-01

    Despite the long tradition of publishing digital datasets in Astronomy, and the existence of a rich network of services providing astronomical datasets in standardized interoperable formats through the Virtual Observatory (VO), there has been little use of scientific workflow technologies in this field. In this paper we present AstroTaverna, a plugin that we have developed for the Taverna Workbench scientific workflow management system. It integrates existing VO web services as first-class building blocks in Taverna workflows, allowing the digital capture of otherwise lost procedural steps manually performed in e.g. GUI tools, providing reproducibility and re-use. It improves the readability of digital VO recipes with a comprehensive view of the entire automated execution process, complementing the scarce narratives produced in the classic documentation practices, transforming them into living tutorials for an efficient use of the VO infrastructure. The plugin also adds astronomical data manipulation and transformation tools based on the STIL Tool Set and the integration of Aladin VO software, as well as interactive connectivity with SAMP-compliant astronomy tools.

  5. Combining ontologies and workflows to design formal protocols for biological laboratories

    PubMed Central

    2010-01-01

    Background Laboratory protocols in life sciences tend to be written in natural language, with negative consequences on repeatability, distribution and automation of scientific experiments. Formalization of knowledge is becoming popular in science. In the case of laboratory protocols two levels of formalization are needed: one for the entities and individuals operations involved in protocols and another one for the procedures, which can be manually or automatically executed. This study aims to combine ontologies and workflows for protocol formalization. Results A laboratory domain specific ontology and the COW (Combining Ontologies with Workflows) software tool were developed to formalize workflows built on ontologies. A method was specifically set up to support the design of structured protocols for biological laboratory experiments. The workflows were enhanced with ontological concepts taken from the developed domain specific ontology. The experimental protocols represented as workflows are saved in two linked files using two standard interchange languages (i.e. XPDL for workflows and OWL for ontologies). A distribution package of COW including installation procedure, ontology and workflow examples, is freely available from http://www.bmr-genomics.it/farm/cow. Conclusions Using COW, a laboratory protocol may be directly defined by wet-lab scientists without writing code, which will keep the resulting protocol's specifications clear and easy to read and maintain. PMID:20416048

  6. Designing a road map for geoscience workflows

    NASA Astrophysics Data System (ADS)

    Duffy, Christopher; Gil, Yolanda; Deelman, Ewa; Marru, Suresh; Pierce, Marlon; Demir, Ibrahim; Wiener, Gerry

    2012-06-01

    Advances in geoscience research and discovery are fundamentally tied to data and computation, but formal strategies for managing the diversity of models and data resources in the Earth sciences have not yet been resolved or fully appreciated. The U.S. National Science Foundation (NSF) EarthCube initiative (http://earthcube.ning.com), which aims to support community-guided cyberinfrastructure to integrate data and information across the geosciences, recently funded four community development activities: Geoscience Workflows; Semantics and Ontologies; Data Discovery, Mining, and Integration; and Governance. The Geoscience Workflows working group, with broad participation from the geosciences, cyberinfrastructure, and other relevant communities, is formulating a workflows road map (http://sites.google.com/site/earthcubeworkflow/). The Geoscience Workflows team coordinates with each of the other community development groups given their direct relevance to workflows. Semantics and ontologies are mechanisms for describing workflows and the data they process.

  7. Controllability in Temporal Conceptual Workflow Schemata

    NASA Astrophysics Data System (ADS)

    Combi, Carlo; Posenato, Roberto

    Workflow technology has emerged as one of the leading technologies in modelling, redesigning, and executing business processes. Currently available workflow management systems (WfMS) and research prototypes offer a very limited support for the definition, detection, and management of temporal constraints over business processes. In this paper, we propose a new advanced workflow conceptual model for expressing time constraints in business processes and, in particular, we introduce and discuss the concept of controllability for workflow schemata and its evaluation at process design time. Controllability refers to the capability of executing a workflow for any possible duration of tasks. Since in several situations durations of tasks cannot be decided by WfMSs, even tough the minimum and the maximum durations for each task are known, checking controllability is stronger than verifying the consistency of the workflow temporal constraints.

  8. Data Processing Workflows to Support Reproducible Data-driven Research in Hydrology

    NASA Astrophysics Data System (ADS)

    Goodall, J. L.; Essawy, B.; Xu, H.; Rajasekar, A.; Moore, R. W.

    2015-12-01

    Geoscience analyses often require the use of existing data sets that are large, heterogeneous, and maintained by different organizations. A particular challenge in creating reproducible analyses using these data sets is automating the workflows required to transform raw datasets into model specific input files and finally into publication ready visualizations. Data grids, such as the Integrated Rule-Oriented Data System (iRODS), are architectures that allow scientists to access and share large data sets that are geographically distributed on the Internet, but appear to the scientist as a single file management system. The DataNet Federation Consortium (DFC) project is built on iRODS and aims to demonstrate data and computational interoperability across scientific communities. This paper leverages iRODS and the DFC to demonstrate how hydrological modeling workflows can be encapsulated as workflows using the iRODS concept of Workflow Structured Objects (WSO). An example use case is presented for automating hydrologic model post-processing routines that demonstrates how WSOs can be created and used within the DFC to automate the creation of data visualizations from large model output collections. By co-locating the workflow used to create the visualization with the data collection, the use case demonstrates how data grid technology aids in reuse, reproducibility, and sharing of workflows within scientific communities.

  9. Workflow simulation and its system development

    NASA Astrophysics Data System (ADS)

    Li, Renwang; Zhu, Zefei; Wang, Xianmei; Liu, Lei; Jiang, Xuefeng

    2005-12-01

    Workflow technique is a research hotspot in the field of advanced manufacturing technology. However, up to now workflow simulation still lacks necessary evaluation of rationality and validity. Therefore, a principle of workflow simulation was set forth; a kind of workflow simulation mechanism is proposed. It is divided into presentation layer, business logic layer and database layer. Then, taking process of handling business orders as example, and taking time, quality, cost and service as key factors, a feasible method was developed. Its simulation results of 30 days were listed and analyzed. At last, an amended process of handling business orders is brought forward.

  10. Facilitating hydrological data analysis workflows in R: the RHydro package

    NASA Astrophysics Data System (ADS)

    Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

    2015-04-01

    The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges

  11. Big data analytics workflow management for eScience

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni

    2015-04-01

    In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the

  12. Pegasus Workflow Management System: Helping Applications From Earth and Space

    NASA Astrophysics Data System (ADS)

    Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.

    2010-12-01

    Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved

  13. a Standardized Approach to Topographic Data Processing and Workflow Management

    NASA Astrophysics Data System (ADS)

    Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.

    2013-12-01

    An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and

  14. A Formal Framework for Workflow Analysis

    NASA Astrophysics Data System (ADS)

    Cravo, Glória

    2010-09-01

    In this paper we provide a new formal framework to model and analyse workflows. A workflow is the formal definition of a business process that consists in the execution of tasks in order to achieve a certain objective. In our work we describe a workflow as a graph whose vertices represent tasks and the arcs are associated to workflow transitions. Each task has associated an input/output logic operator. This logic operator can be the logical AND (•), the OR (⊗), or the XOR -exclusive-or—(⊕). Moreover, we introduce algebraic concepts in order to completely describe completely the structure of workflows. We also introduce the concept of logical termination. Finally, we provide a necessary and sufficient condition for this property to hold.

  15. A Community-Driven Workflow Recommendations and Reuse Infrastructure

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

    2013-12-01

    Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from

  16. A workflow learning model to improve geovisual analytics utility.

    PubMed

    Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

    2009-01-01

    the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.

  17. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling.

    PubMed

    Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao

    2016-01-01

    Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads. PMID:27504236

  18. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling.

    PubMed

    Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao

    2016-01-01

    Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads.

  19. Taverna Workflows in the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Benson, K.; Cecconi, B.

    2015-12-01

    Taverna workflows used in the Virtual ObservatoryPlanetary and Solar applications developed over the last decade generate dataat a previously unimaginable scale. One of these programmes which builds on the strengths of IDIS of Europlanet FP7, is the Virtual European Solar and Planetary Access (VESPA). With VESPA more data will be distributed and the connectivity of tools and infrastructure willimprove. VESPA enables growth of the user and provider community. However the challenge of connectivity persist throughout applications data services. VESPA calls are formed in part by tools and interactions services. One such tool and interaction service is the Taverna workflow management system. Workflows allow to address the challenges of data interconnectivity by establishing pipeline to services offered by other data streaming services. Workflows offer the capability to cross domains and overome interoperability issues. Furthermore, Taverna offers sharing of workflows; academic community 'myExperiment', a social site for scientists, supports search and opens access to pre existing workflows. This presentation focuses on cross domain workflows including use of the infrastructure setup with Helio, EUROPLANET and VAMDC projects. Hands on demonstration and an opportunity to join the community discussion will make the presentation more interactive

  20. Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

    SciTech Connect

    Ludascher, Bertram; Altintas, Ilkay

    2013-09-06

    Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of a deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.

  1. Structuring Clinical Workflows for Diabetes Care

    PubMed Central

    Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.

    2014-01-01

    Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765

  2. Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials.

    PubMed

    Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja; Ogilvie Hendren, Christine

    2015-01-01

    There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437

  3. Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials

    PubMed Central

    Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja

    2015-01-01

    Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437

  4. VisIVO: A Web-Based, Workflow-Enabled Gateway for Astrophysical Visualization

    NASA Astrophysics Data System (ADS)

    Costa, A.; Bandieramonte, M.; Becciani, U.; Krokos, M.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, S.; Sciacca, E.; Vitello, F.

    2013-10-01

    We present a web-based and workflow-enabled framework called VisIVO Gateway that allows integration of large-scale multidimensional datasets together with applications for visualization and exploration on Distributed Computing Infrastructures (DCIs). Our framework is implemented through a workflow-enabled portal wrapped around WS-PGRADE which is the grid User Support Environment (gUSE) portal. We provide customized interfaces for creating, invoking, monitoring and also modifying scientific workflows. All technical complexities, e.g. related to visualization algorithms and DCI configurations, are conveniently hidden from view. A number of workflows are enabled by default, e.g. implementing local or remote uploading and creation of scientific movies. Scientific movies are useful not only to scientists for presenting their research results, but also to museums and science centers for engaging visitors with complex scientific concepts. Our gateway can be accessed via standard www interfaces but also through a newly developed iOS mobile application offering novel ways for sharing analysis and exploration experiences with large-scale datasets in collaborative environments.

  5. Security aspects in teleradiology workflow

    NASA Astrophysics Data System (ADS)

    Soegner, Peter I.; Helweg, Gernot; Holzer, Heimo; zur Nedden, Dieter

    2000-05-01

    The medicolegal necessity of privacy, security and confidentiality was the aim of the attempt to develop a secure teleradiology workflow between the telepartners -- radiologist and the referring physician. To avoid the lack of dataprotection and datasecurity we introduced biometric fingerprint scanners in combination with smart cards to identify the teleradiology partners and communicated over an encrypted TCP/IP satellite link between Innsbruck and Reutte. We used an asymmetric kryptography method to guarantee authentification, integrity of the data-packages and confidentiality of the medical data. It was necessary to use a biometric feature to avoid a case of mistaken identity of persons, who wanted access to the system. Only an invariable electronical identification allowed a legal liability to the final report and only a secure dataconnection allowed the exchange of sensible medical data between different partners of Health Care Networks. In our study we selected the user friendly combination of a smart card and a biometric fingerprint technique, called SkymedTM Double Guard Secure Keyboard (Agfa-Gevaert) to confirm identities and log into the imaging workstations and the electronic patient record. We examined the interoperability of the used software with the existing platforms. Only the WIN-XX operating systems could be protected at the time of our study.

  6. Workflow Optimization in Vertebrobasilar Occlusion

    SciTech Connect

    Kamper, Lars Meyn, Hannes; Nordmeyer, Simone; Kempkes, Udo; Piroth, Werner

    2012-06-15

    Objective: In vertebrobasilar occlusion, rapid recanalization is the only substantial means to improve the prognosis. We introduced a standard operating procedure (SOP) for interventional therapy to analyze the effects on interdisciplinary time management. Methods: Intrahospital time periods between hospital admission and neuroradiological intervention were retrospectively analyzed, together with the patients' outcome, before (n = 18) and after (n = 20) implementation of the SOP. Results: After implementation of the SOP, we observed statistically significant improvement of postinterventional patient neurological status (p = 0.017). In addition, we found a decrease of 5:33 h for the mean time period from hospital admission until neuroradiological intervention. The recanalization rate increased from 72.2% to 80% after implementation of the SOP. Conclusion: Our results underscore the relevance of SOP implementation and analysis of time management for clinical workflow optimization. Both may trigger awareness for the need of efficient interdisciplinary time management. This could be an explanation for the decreased time periods and improved postinterventional patient status after SOP implementation.

  7. Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

    NASA Astrophysics Data System (ADS)

    Harris, A. T.; Ramachandran, R.; Maskey, M.

    2013-12-01

    The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL

  8. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    PubMed Central

    Tsai, Yingssu; McPhillips, Scott E.; González, Ana; McPhillips, Timothy M.; Zinn, Daniel; Cohen, Aina E.; Feese, Michael D.; Bushnell, David; Tiefenbrunn, Theresa; Stout, C. David; Ludaescher, Bertram; Hedman, Britt; Hodgson, Keith O.; Soltis, S. Michael

    2013-01-01

    AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallo­graphy steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference

  9. Integrated workflows for spiking neuronal network simulations

    PubMed Central

    Antolík, Ján; Davison, Andrew P.

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID

  10. Integrated workflows for spiking neuronal network simulations.

    PubMed

    Antolík, Ján; Davison, Andrew P

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.

  11. Integrating advanced visualization technology into the planetary Geoscience workflow

    NASA Astrophysics Data System (ADS)

    Huffman, John; Forsberg, Andrew; Loomis, Andrew; Head, James; Dickson, James; Fassett, Caleb

    2011-09-01

    Recent advances in computer visualization have allowed us to develop new tools for analyzing the data gathered during planetary missions, which is important, since these data sets have grown exponentially in recent years to tens of terabytes in size. As part of the Advanced Visualization in Solar System Exploration and Research (ADVISER) project, we utilize several advanced visualization techniques created specifically with planetary image data in mind. The Geoviewer application allows real-time active stereo display of images, which in aggregate have billions of pixels. The ADVISER desktop application platform allows fast three-dimensional visualization of planetary images overlain on digital terrain models. Both applications include tools for easy data ingest and real-time analysis in a programmatic manner. Incorporation of these tools into our everyday scientific workflow has proved important for scientific analysis, discussion, and publication, and enabled effective and exciting educational activities for students from high school through graduate school.

  12. Additions to the Human Plasma Proteome via a Tandem MARS Depletion iTRAQ-Based Workflow.

    PubMed

    Cao, Zhiyun; Yende, Sachin; Kellum, John A; Robinson, Renã A S

    2013-01-01

    Robust platforms for determining differentially expressed proteins in biomarker and discovery studies using human plasma are of great interest. While increased depth in proteome coverage is desirable, it is associated with costs of experimental time due to necessary sample fractionation. We evaluated a robust quantitative proteomics workflow for its ability (1) to provide increased depth in plasma proteome coverage and (2) to give statistical insight useful for establishing differentially expressed plasma proteins. The workflow involves dual-stage immunodepletion on a multiple affinity removal system (MARS) column, iTRAQ tagging, offline strong-cation exchange chromatography, and liquid chromatography tandem mass spectrometry (LC-MS/MS). Independent workflow experiments were performed in triplicate on four plasma samples tagged with iTRAQ 4-plex reagents. After stringent criteria were applied to database searched results, 689 proteins with at least two spectral counts (SC) were identified. Depth in proteome coverage was assessed by comparison to the 2010 Human Plasma Proteome Reference Database in which our studies reveal 399 additional proteins which have not been previously reported. Additionally, we report on the technical variation of this quantitative workflow which ranges from ±11 to 30%.

  13. Additions to the Human Plasma Proteome via a Tandem MARS Depletion iTRAQ-Based Workflow

    PubMed Central

    Cao, Zhiyun; Yende, Sachin; Kellum, John A.; Robinson, Renã A. S.

    2013-01-01

    Robust platforms for determining differentially expressed proteins in biomarker and discovery studies using human plasma are of great interest. While increased depth in proteome coverage is desirable, it is associated with costs of experimental time due to necessary sample fractionation. We evaluated a robust quantitative proteomics workflow for its ability (1) to provide increased depth in plasma proteome coverage and (2) to give statistical insight useful for establishing differentially expressed plasma proteins. The workflow involves dual-stage immunodepletion on a multiple affinity removal system (MARS) column, iTRAQ tagging, offline strong-cation exchange chromatography, and liquid chromatography tandem mass spectrometry (LC-MS/MS). Independent workflow experiments were performed in triplicate on four plasma samples tagged with iTRAQ 4-plex reagents. After stringent criteria were applied to database searched results, 689 proteins with at least two spectral counts (SC) were identified. Depth in proteome coverage was assessed by comparison to the 2010 Human Plasma Proteome Reference Database in which our studies reveal 399 additional proteins which have not been previously reported. Additionally, we report on the technical variation of this quantitative workflow which ranges from ±11 to 30%. PMID:23509626

  14. How Workflow Documentation Facilitates Curation Planning

    NASA Astrophysics Data System (ADS)

    Wickett, K.; Thomer, A. K.; Baker, K. S.; DiLauro, T.; Asangba, A. E.

    2013-12-01

    The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation "upstream" in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as 'intermediate' data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures. We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

  15. Multilevel Workflow System in the ATLAS Experiment

    NASA Astrophysics Data System (ADS)

    Borodin, M.; De, K.; Garcia Navarro, J.; Golubkov, D.; Klimentov, A.; Maeno, T.; Vaniachine, A.; ATLAS Collaboration

    2015-05-01

    The ATLAS experiment is scaling up Big Data processing for the next LHC run using a multilevel workflow system comprised of many layers. In Big Data processing ATLAS deals with datasets, not individual files. Similarly a task (comprised of many jobs) has become a unit of the ATLAS workflow in distributed computing, with about 0.8M tasks processed per year. In order to manage the diversity of LHC physics (exceeding 35K physics samples per year), the individual data processing tasks are organized into workflows. For example, the Monte Carlo workflow is composed of many steps: generate or configure hard-processes, hadronize signal and minimum-bias (pileup) events, simulate energy deposition in the ATLAS detector, digitize electronics response, simulate triggers, reconstruct data, convert the reconstructed data into ROOT ntuples for physics analysis, etc. Outputs are merged and/or filtered as necessary to optimize the chain. The bi-level workflow manager - ProdSys2 - generates actual workflow tasks and their jobs are executed across more than a hundred distributed computing sites by PanDA - the ATLAS job-level workload management system. On the outer level, the Database Engine for Tasks (DEfT) empowers production managers with templated workflow definitions. On the next level, the Job Execution and Definition Interface (JEDI) is integrated with PanDA to provide dynamic job definition tailored to the sites capabilities. We report on scaling up the production system to accommodate a growing number of requirements from main ATLAS areas: Trigger, Physics and Data Preparation.

  16. A Novel Spectral Library Workflow to Enhance Protein Identifications

    PubMed Central

    Li, Haomin; Zong, Nobel C.; Liang, Xiangbo; Kim, Allen; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-01-01

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. PMID:23391412

  17. A novel spectral library workflow to enhance protein identifications.

    PubMed

    Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-04-01

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. PMID:23391412

  18. Agalma: an automated phylogenomics workflow

    PubMed Central

    2013-01-01

    ://bitbucket.org/caseywdunn/agalma. Conclusions Agalma allows complex phylogenomic analyses to be implemented and described unambiguously as a series of high-level commands. This will enable phylogenomic studies to be readily reproduced, modified, and extended. Agalma also facilitates methods development by providing a complete modular workflow, bundled with test data, that will allow further optimization of each step in the context of a full phylogenomic analysis. PMID:24252138

  19. Creating OGC Web Processing Service workflows using a web-based editor

    NASA Astrophysics Data System (ADS)

    de Jesus, J.; Walker, P.; Grant, M.

    2012-04-01

    The OGC WPS (Web Processing Service) specifies how geospatial algorithms may be accessed in an SOA (Service Oriented Architecture). Service providers can encode both simple and sophisticated algorithms as WPS processes and publish them as web services. These services are not only useful individually but may be built into complex processing chains (workflows) that can solve complex data analysis and/or scientific problems. The NETMAR project has extended the Web Processing Service (WPS) framework to provide transparent integration between it and the commonly used WSDL (Web Service Description Language) that describes the web services and its default SOAP (Simple Object Access Protocol) binding. The extensions allow WPS services to be orchestrated using commonly used tools (in this case Taverna Workbench, but BPEL based systems would also be an option). We have also developed a WebGUI service editor, based on HTML5 and the WireIt! Javascript API, that allows users to create these workflows using only a web browser. The editor is coded entirely in Javascript and performs all XSLT transformations needed to produce a Taverna compatible (T2FLOW) workflow description which can be exported and run on a local Taverna Workbench or uploaded to a web-based orchestration server and run there. Here we present the NETMAR WebGUI service chain editor and discuss the problems associated with the development of a WebGUI for scientific workflow editing; content transformation into the Taverna orchestration language (T2FLOW/SCUFL); final orchestration in the Taverna engine and how to deal with the large volumes of data being transferred between different WPS services (possibly running on different servers) during workflow orchestration. We will also demonstrate using the WebGUI for creating a simple workflow making use of published web processing services, showing how simple services may be chained together to produce outputs that would previously have required a GIS (Geographic

  20. Integrating configuration workflows with project management system

    NASA Astrophysics Data System (ADS)

    Nilsen, Dimitri; Weber, Pavel

    2014-06-01

    The complexity of the heterogeneous computing resources, services and recurring infrastructure changes at the GridKa WLCG Tier-1 computing center require a structured approach to configuration management and optimization of interplay between functional components of the whole system. A set of tools deployed at GridKa, including Puppet, Redmine, Foreman, SVN and Icinga, provides the administrative environment giving the possibility to define and develop configuration workflows, reduce the administrative effort and improve sustainable operation of the whole computing center. In this presentation we discuss the developed configuration scenarios implemented at GridKa, which we use for host installation, service deployment, change management procedures, service retirement etc. The integration of Puppet with a project management tool like Redmine provides us with the opportunity to track problem issues, organize tasks and automate these workflows. The interaction between Puppet and Redmine results in automatic updates of the issues related to the executed workflow performed by different system components. The extensive configuration workflows require collaboration and interaction between different departments like network, security, production etc. at GridKa. Redmine plugins developed at GridKa and integrated in its administrative environment provide an effective way of collaboration within the GridKa team. We present the structural overview of the software components, their connections, communication protocols and show a few working examples of the workflows and their automation.

  1. Integrating text mining into the MGI biocuration workflow.

    PubMed

    Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  2. Integrating text mining into the MGI biocuration workflow.

    PubMed

    Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

    2009-01-01

    A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi

  3. Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2014-12-01

    The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent Cyber

  4. A Semi-Automated Workflow Solution for Data Set Publication

    DOE PAGESBeta

    Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; Wright, Daine M.; Devarakonda, Ranjeet; Wei, Yaxing; Hook, Les A.; McMurry, Benjamin F.

    2016-03-08

    In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less

  5. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks.

    PubMed

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G; Hogg, David C; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user's pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  6. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks

    PubMed Central

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G.; Hogg, David C.; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  7. Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks.

    PubMed

    Bleser, Gabriele; Damen, Dima; Behera, Ardhendu; Hendeby, Gustaf; Mura, Katharina; Miezal, Markus; Gee, Andrew; Petersen, Nils; Maçães, Gustavo; Domingues, Hugo; Gorecky, Dominic; Almeida, Luis; Mayol-Cuevas, Walterio; Calway, Andrew; Cohn, Anthony G; Hogg, David C; Stricker, Didier

    2015-01-01

    Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user's pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a

  8. Achieving Coordination through Dynamic Construction of Open Workflows

    NASA Astrophysics Data System (ADS)

    Thomas, Louis; Wilson, Justin; Roman, Gruia-Catalin; Gill, Christopher

    Workflow middleware executes tasks orchestrated by rules defined in a carefully handcrafted static graph. Workflow management systems have proved effective for service-oriented business automation in stable, wired infrastructures. We introduce a radically new paradigm for workflow construction and execution called open workflow to support goal-directed coordination among physically mobile people and devices that form a transient community over an ad hoc wireless network. The quintessential feature of the open workflow paradigm is dynamic construction of custom, context-specific workflows in response to unpredictable and evolving circumstances by exploiting the knowledge and services available within a given spatiotemporal context. This paper introduces the open workflow approach, surveys open research challenges in this promising new field, and presents algorithmic, architectural, and evaluation results for the first practical realization of an open workflow management system.

  9. RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service

    NASA Astrophysics Data System (ADS)

    Yang, Chao; Chen, Nengcheng; Di, Liping

    2012-10-01

    Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.

  10. Workflow Automation: A Collective Case Study

    ERIC Educational Resources Information Center

    Harlan, Jennifer

    2013-01-01

    Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…

  11. Conventions and workflows for using Situs

    SciTech Connect

    Wriggers, Willy

    2012-04-01

    Recent developments of the Situs software suite for multi-scale modeling are reviewed. Typical workflows and conventions encountered during processing of biophysical data from electron microscopy, tomography or small-angle X-ray scattering are described. Situs is a modular program package for the multi-scale modeling of atomic resolution structures and low-resolution biophysical data from electron microscopy, tomography or small-angle X-ray scattering. This article provides an overview of recent developments in the Situs package, with an emphasis on workflows and conventions that are important for practical applications. The modular design of the programs facilitates scripting in the bash shell that allows specific programs to be combined in creative ways that go beyond the original intent of the developers. Several scripting-enabled functionalities, such as flexible transformations of data type, the use of symmetry constraints or the creation of two-dimensional projection images, are described. The processing of low-resolution biophysical maps in such workflows follows not only first principles but often relies on implicit conventions. Situs conventions related to map formats, resolution, correlation functions and feature detection are reviewed and summarized. The compatibility of the Situs workflow with CCP4 conventions and programs is discussed.

  12. Building Digital Audio Preservation Infrastructure and Workflows

    ERIC Educational Resources Information Center

    Young, Anjanette; Olivieri, Blynne; Eckler, Karl; Gerontakos, Theodore

    2010-01-01

    In 2009 the University of Washington (UW) Libraries special collections received funding for the digital preservation of its audio indigenous language holdings. The university libraries, where the authors work in various capacities, had begun digitizing image and text collections in 1997. Because of this, at the onset of the project, workflows (a…

  13. Text mining for the biocuration workflow.

    PubMed

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  14. Arbor: Comparative Analysis Workflows for the Tree of Life

    PubMed Central

    Harmon, Luke J.; Baumes, Jeffrey; Hughes, Charles; Soberon, Jorge; Specht, Chelsea D; Turner, Wesley; Lisle, Curtis; Thacker, Robert W.

    2013-01-01

    We describe our efforts to develop a software package, Arbor, that will enable scientific research in all aspects of comparative biology. This software will enable developmental biologists, geneticists, ecologists, geographers, paleobiologists, educators, and students to analyze diverse types of comparative data at multiple phylogenetic and spatiotemporal scales using an intuitive visual interface. Arbor’s user-defined workflows will be exported and shared so that entire analyses can be quickly replicated with new or updated data. Arbor will also be designed to easily and seamlessly expand to include novel analytical tools as they are developed. Here we describe the core components of Arbor, as well as provide details of one proposed test case to illustrate the software’s key functionality. PMID:23811960

  15. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    SciTech Connect

    Tsai, Yingssu; McPhillips, Scott E.; González, Ana; McPhillips, Timothy M.; Zinn, Daniel; Cohen, Aina E.; Feese, Michael D.; Bushnell, David; Tiefenbrunn, Theresa; Stout, C. David; Ludaescher, Bertram; Hedman, Britt; Hodgson, Keith O.; Soltis, S. Michael

    2013-05-01

    New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully

  16. Quantitative Regression Models for the Prediction of Chemical Properties by an Efficient Workflow.

    PubMed

    Yin, Yongmin; Xu, Congying; Gu, Shikai; Li, Weihua; Liu, Guixia; Tang, Yun

    2015-10-01

    Rapid safety assessment is more and more needed for the increasing chemicals both in chemical industries and regulators around the world. The traditional experimental methods couldn't meet the current demand any more. With the development of the information technology and the growth of experimental data, in silico modeling has become a practical and rapid alternative for the assessment of chemical properties, especially for the toxicity prediction of organic chemicals. In this study, a quantitative regression workflow was built by KNIME to predict chemical properties. With this regression workflow, quantitative values of chemical properties can be obtained, which is different from the binary-classification model or multi-classification models that can only give qualitative results. To illustrate the usage of the workflow, two predictive models were constructed based on datasets of Tetrahymena pyriformis toxicity and Aqueous solubility. The qcv (2) and qtest (2) of 5-fold cross validation and external validation for both types of models were greater than 0.7, which implies that our models are robust and reliable, and the workflow is very convenient and efficient in prediction of various chemical properties. PMID:27490968

  17. Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Ramachandran, R.; Lynnes, C.

    2009-12-01

    A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of

  18. geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

    NASA Astrophysics Data System (ADS)

    Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

    2015-12-01

    The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from

  19. An ever-changing systemic environment for migrating workflows

    NASA Astrophysics Data System (ADS)

    Assimakopoulos, Nikitas A.

    2000-05-01

    In this paper we present the concept of the systemic and dynamic environment for migrating workflows, and the considerations related to the implementation of this concept. Migrating workflows are a computational metaphor for the way most people conduct their daily business: they visit a place, use a service (perhaps after some negotiation), and move on to the next place. A migrating workflow behaves similarly: it transfers its code (specification) and its execution state to a site, negotiates a service to be executed on its behalf, receives the results, and moves on. Dialog between the workflow and individual sites may influence the workflow's migration. Thus the actual workflow instance is defined during run-time, as an effect of merging the static workflow specification and the local site rules and policies.

  20. Applying direct observation to model workflow and assess adoption.

    PubMed

    Unertl, Kim M; Weinger, Matthew B; Johnson, Kevin B

    2006-01-01

    Lack of understanding about workflow can impair health IT system adoption. Observational techniques can provide valuable information about clinical workflow. A pilot study using direct observation was conducted in an outpatient chronic disease clinic. The goals of the study were to assess workflow and information flow and to develop a general model of workflow and information behavior. Over 55 hours of direct observation showed that the pilot site utilized many of the features of the informatics systems available to them, but also employed multiple non-electronic artifacts and workarounds. Gaps existed between clinic workflow and informatics tool workflow, as well as between institutional expectations of informatics tool use and actual use. Concurrent use of both paper-based and electronic systems resulted in duplication of effort and inefficiencies. A relatively short period of direct observation revealed important information about workflow and informatics tool adoption.

  1. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

    PubMed

    Stockton, David B; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175

  2. NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

    PubMed Central

    Stockton, David B.; Santamaria, Fidel

    2015-01-01

    We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175

  3. Quantifying nursing workflow in medication administration.

    PubMed

    Keohane, Carol A; Bane, Anne D; Featherstone, Erica; Hayes, Judy; Woolf, Seth; Hurley, Ann; Bates, David W; Gandhi, Tejal K; Poon, Eric G

    2008-01-01

    New medication administration systems are showing promise in improving patient safety at the point of care, but adoption of these systems requires significant changes in nursing workflow. To prepare for these changes, the authors report on a time-motion study that measured the proportion of time that nurses spend on various patient care activities, focusing on medication administration-related activities. Implications of their findings are discussed.

  4. Computing Workflows for Biologists: A Roadmap

    PubMed Central

    Shade, Ashley; Teal, Tracy K.

    2015-01-01

    Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research. PMID:26600012

  5. Computing Workflows for Biologists: A Roadmap.

    PubMed

    Shade, Ashley; Teal, Tracy K

    2015-01-01

    Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research. PMID:26600012

  6. IDD Archival Hardware Architecture and Workflow

    SciTech Connect

    Mendonsa, D; Nekoogar, F; Martz, H

    2008-10-09

    This document describes the functionality of every component in the DHS/IDD archival and storage hardware system shown in Fig. 1. The document describes steps by step process of image data being received at LLNL then being processed and made available to authorized personnel and collaborators. Throughout this document references will be made to one of two figures, Fig. 1 describing the elements of the architecture and the Fig. 2 describing the workflow and how the project utilizes the available hardware.

  7. Computing Workflows for Biologists: A Roadmap.

    PubMed

    Shade, Ashley; Teal, Tracy K

    2015-01-01

    Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research.

  8. Schedule-Aware Workflow Management Systems

    NASA Astrophysics Data System (ADS)

    Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

    Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.

  9. Scientific rigor through videogames.

    PubMed

    Treuille, Adrien; Das, Rhiju

    2014-11-01

    Hypothesis-driven experimentation - the scientific method - can be subverted by fraud, irreproducibility, and lack of rigorous predictive tests. A robust solution to these problems may be the 'massive open laboratory' model, recently embodied in the internet-scale videogame EteRNA. Deploying similar platforms throughout biology could enforce the scientific method more broadly.

  10. Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics

    NASA Astrophysics Data System (ADS)

    Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.

    2015-12-01

    Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in Net

  11. Robust Regression.

    PubMed

    Huang, Dong; Cabral, Ricardo; De la Torre, Fernando

    2016-02-01

    Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740

  12. Toward a tool for scheduling application workflows onto distributed grid systems

    NASA Astrophysics Data System (ADS)

    Mandal, Anirban

    In this dissertation, we present a design and implementation of a tool for automatic mapping and scheduling of large scientific application workflows onto distributed, heterogeneous Grid environments. The thesis of this work is that plan-ahead, application-independent scheduling of workflow applications based on performance models can reduce the turnaround time for Grid execution of the application, reducing burden of Grid application development. We applied the scheduling strategies successfully to Grid applications from the domains of bio-imaging and astronomy and demonstrated the effectiveness and efficiency of the scheduling approaches. We also proposed and evaluated a novel scheduling heuristic based on a middle-out traversal of the application workflow. A study showed that jobs have to wait in batch queues for a considerable amount of time before they begin execution. Schedulers must consider batch queue waiting times when scheduling Grid applications onto resources with batch queue front ends. Hence, we developed a smart scheduler that considers estimates of batch queue wait times when it constructs schedules for Grid applications. We compared the proposed scheduling techniques with existing dynamic scheduling strategies. An experimental evaluation of this scheduler on data-intensive workflows shows that its approach of planning schedules in advance improves over previous online scheduling approaches. We studied the scalability of the proposed scheduling approaches. To deal with the scale of future Grids consisting of hundreds of thousands of resources, we designed and implemented a novel cluster-level scheduling algorithm, which scales linearly on the number of abstract resource classes. An experimental evaluation using workflows from two applications shows that the cluster-level scheduler achieves good scalability without sacrificing the quality of schedule.

  13. SoftWare Automated Workflow Technology

    2009-05-27

    SWAWT is a workflow management system designed to streamline team-oriented software development activities. Combining widely used tools (make, Subversion, CVS, RPM, XML, etc.), SWAWT creates an open environment that actually bridges software development phases with project management tasks. The design and implementation of SWAWT is based on roles, conventions, and procedures that will work with any software life cycle process (Waterfall, XP, etc.). This practical approach integrates, automates, and even eliminates many activities associated withmore » development, testing, configuration management, packaging, and delivery of software.« less

  14. SoftWare Automated Workflow Technology

    SciTech Connect

    Darren Curtis, Chance Younkin

    2009-05-27

    SWAWT is a workflow management system designed to streamline team-oriented software development activities. Combining widely used tools (make, Subversion, CVS, RPM, XML, etc.), SWAWT creates an open environment that actually bridges software development phases with project management tasks. The design and implementation of SWAWT is based on roles, conventions, and procedures that will work with any software life cycle process (Waterfall, XP, etc.). This practical approach integrates, automates, and even eliminates many activities associated with development, testing, configuration management, packaging, and delivery of software.

  15. Toward Exascale Seismic Imaging: Taming Workflow and I/O Issues

    NASA Astrophysics Data System (ADS)

    Lefebvre, M. P.; Bozdag, E.; Lei, W.; Rusmanugroho, H.; Smith, J. A.; Tromp, J.; Yuan, Y.

    2013-12-01

    Providing a better understanding of the physics and chemistry of Earth's interior through numerical simulations has always required tremendous computational resources. Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns on how to obtain optimum performance. Several issues are currently being investigated by the HPC community. To name a few, we can list energy consumption, fault resilience, scalability of the current parallel paradigms, large workflow management, I/O performance and feature extraction with large datasets. For this presentation, we focus on the last three issues. In the context of seismic imaging, in particular for simulations based on adjoint methods, workflows are well defined. They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts composing it. The usual approach is to speedup the purely computational parts by code tuning in order to reach higher FLOPS and better memory usage. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from a severe I/O bottleneck. This limitation occurs both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). In both cases, a parallel I/O library, ORNL's ADIOS, is used to drastically lessen the weight of disk access. Moreover, parallel visualization tools, such as VisIt, are able to take advantage of the metadata included in our ADIOS outputs to extract features and

  16. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

    PubMed Central

    Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  17. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    PubMed

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  18. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    PubMed

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  19. Solutions for complex, multi data type and multi tool analysis: principles and applications of using workflow and pipelining methods.

    PubMed

    Munro, Robin E J; Guo, Yike

    2009-01-01

    Analytical workflow technology, sometimes also called data pipelining, is the fundamental component that provides the scalable analytical middleware that can be used to enable the rapid building and deployment of an analytical application. Analytical workflows enable researchers, analysts and informaticians to integrate and access data and tools from structured and non-structured data sources so that analytics can bridge different silos of information; compose multiple analytical methods and data transformations without coding; rapidly develop applications and solutions by visually constructing analytical workflows that are easy to revise should the requirements change; access domain-specific extensions for specific projects or areas, for example, text extraction, visualisation, reporting, genetics, cheminformatics, bioinformatics and patient-based analytics; automatically deploy workflows directly into web portals and as web services to be part of a service-oriented architecture (SOA). By performing workflow building, using a middleware layer for data integration, it is a relatively simple exercise to visually design an analytical process for data analysis and then publish this as a service to a web browser. All this is encapsulated into what can be referred to as an 'Embedded Analytics' methodology which will be described here with examples covering different scientifically focused data analysis problems.

  20. Building asynchronous geospatial processing workflows with web services

    NASA Astrophysics Data System (ADS)

    Zhao, Peisheng; Di, Liping; Yu, Genong

    2012-02-01

    Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.

  1. Predictive QSAR modeling workflow, model applicability domains, and virtual screening.

    PubMed

    Tropsha, Alexander; Golbraikh, Alexander

    2007-01-01

    Quantitative Structure Activity Relationship (QSAR) modeling has been traditionally applied as an evaluative approach, i.e., with the focus on developing retrospective and explanatory models of existing data. Model extrapolation was considered if only in hypothetical sense in terms of potential modifications of known biologically active chemicals that could improve compounds' activity. This critical review re-examines the strategy and the output of the modern QSAR modeling approaches. We provide examples and arguments suggesting that current methodologies may afford robust and validated models capable of accurate prediction of compound properties for molecules not included in the training sets. We discuss a data-analytical modeling workflow developed in our laboratory that incorporates modules for combinatorial QSAR model development (i.e., using all possible binary combinations of available descriptor sets and statistical data modeling techniques), rigorous model validation, and virtual screening of available chemical databases to identify novel biologically active compounds. Our approach places particular emphasis on model validation as well as the need to define model applicability domains in the chemistry space. We present examples of studies where the application of rigorously validated QSAR models to virtual screening identified computational hits that were confirmed by subsequent experimental investigations. The emerging focus of QSAR modeling on target property forecasting brings it forward as predictive, as opposed to evaluative, modeling approach.

  2. Modeling Complex Workflow in Molecular Diagnostics

    PubMed Central

    Gomah, Mohamed E.; Turley, James P.; Lu, Huimin; Jones, Dan

    2010-01-01

    One of the hurdles to achieving personalized medicine has been implementing the laboratory processes for performing and reporting complex molecular tests. The rapidly changing test rosters and complex analysis platforms in molecular diagnostics have meant that many clinical laboratories still use labor-intensive manual processing and testing without the level of automation seen in high-volume chemistry and hematology testing. We provide here a discussion of design requirements and the results of implementation of a suite of lab management tools that incorporate the many elements required for use of molecular diagnostics in personalized medicine, particularly in cancer. These applications provide the functionality required for sample accessioning and tracking, material generation, and testing that are particular to the evolving needs of individualized molecular diagnostics. On implementation, the applications described here resulted in improvements in the turn-around time for reporting of more complex molecular test sets, and significant changes in the workflow. Therefore, careful mapping of workflow can permit design of software applications that simplify even the complex demands of specialized molecular testing. By incorporating design features for order review, software tools can permit a more personalized approach to sample handling and test selection without compromising efficiency. PMID:20007844

  3. Workflow management for a cosmology collaboratory

    SciTech Connect

    Loken, Stewart C.; McParland, Charles

    2001-07-20

    The Nearby Supernova Factory Project will provide a unique opportunity to bring together simulation and observation to address crucial problems in particle and nuclear physics. Its goal is to significantly enhance our understanding of the nuclear processes in supernovae and to improve our ability to use both Type Ia and Type II supernovae as reference light sources (standard candles) in precision measurements of cosmological parameters. Over the past several years, astronomers and astrophysicists have been conducting in-depth sky searches with the goal of identifying supernovae in their earliest evolutionary stages and, during the 4 to 8 weeks of their most ''explosive'' activity, measure their changing magnitude and spectra. The search program currently under development at LBNL is an earth-based observation program utilizing observational instruments at Haleakala and Mauna Kea, Hawaii and Mt. Palomar, California. This new program provides a demanding testbed for the integration of computational, data management and collaboratory technologies. A critical element of this effort is the use of emerging workflow management tools to permit collaborating scientists to manage data processing and storage and to integrate advanced supernova simulation into the real-time control of the experiments. This paper describes the workflow management framework for the project, discusses security and resource allocation requirements and reviews emerging tools to support this important aspect of collaborative work.

  4. Workflow-Based Software Development Environment

    NASA Technical Reports Server (NTRS)

    Izygon, Michel E.

    2013-01-01

    The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment

  5. Workflow management for a cosmology collaboratory

    NASA Astrophysics Data System (ADS)

    Loken, S. C.; McParland, C.

    2001-07-01

    The Nearby Supernova Factory Project will provide a unique opportunity to bring together simulation and observation to address crucial problems in particle and nuclear physics. Its goal is to significantly enhance our understanding of the nuclear processes in supernovae and to improve our ability to use both Type Ia and Type II supernovae as reference light sources (standard candles) in precision measurements of cosmological parameters. Over the past several years, astronomers and astrophysicists have been conducting in-depth sky searches with the goal of identifying supernovae in their earliest evolutionary stages and, during the four to eight weeks of their most 'explosive' activity, measure their changing magnitude and spectra. The search program currently under development at LBNL (Lawrence Berkeley National Lab) is an earth-based observation program utilizing observational instruments at Haleakala and Mauna Kea, Hawaii and Mt. Palomar, California. This new program provides a demanding testbed for the integration of computational, data management and collaboratory technologies. A critical element of this effort is the use of emerging workflow management tools to permit collaborating scientists to manage data processing and storage and to integrate advanced supernova simulation into the real-time control of the experiments. This paper describes the workflow management framework for the project, discusses security and resource allocation requirements and reviews emerging tools to support this important aspect of collaborative work.

  6. Delta: Data Reduction for Integrated Application Workflows.

    SciTech Connect

    Lofstead, Gerald Fredrick; Jean-Baptiste, Gregory; Oldfield, Ron A.

    2015-06-01

    Integrated Application Workflows (IAWs) run multiple simulation workflow components con- currently on an HPC resource connecting these components using compute area resources and compensating for any performance or data processing rate mismatches. These IAWs require high frequency and high volume data transfers between compute nodes and staging area nodes during the lifetime of a large parallel computation. The available network band- width between the two areas may not be enough to efficiently support the data movement. As the processing power available to compute resources increases, the requirements for this data transfer will become more difficult to satisfy and perhaps will not be satisfiable at all since network capabilities are not expanding at a comparable rate. Furthermore, energy consumption in HPC environments is expected to grow by an order of magnitude as exas- cale systems become a reality. The energy cost of moving large amounts of data frequently will contribute to this issue. It is necessary to reduce the volume of data without reducing the quality of data when it is being processed and analyzed. Delta resolves the issue by addressing the lifetime data transfer operations. Delta removes subsequent identical copies of already transmitted data during transfers and restores those copies once the data has reached the destination. Delta is able to identify duplicated information and determine the most space efficient way to represent it. Initial tests show about 50% reduction in data movement while maintaining the same data quality and transmission frequency.

  7. Mixed Methods Approach for Measuring the Impact of Video Telehealth on Outpatient Clinic Triage Nurse Workflow

    PubMed Central

    Cady, Rhonda G.; Finkelstein, Stanley M.

    2015-01-01

    Nurse-delivered telephone triage is a common component of outpatient clinic settings. Adding new communication technology to clinic triage has the potential to not only transform the triage process, but also alter triage workflow. Evaluating the impact of new technology on an existing workflow is paramount to maximizing efficiency of the delivery system. This study investigated triage nurse workflow before and after the implementation of video telehealth using a sequential mixed methods protocol that combined ethnography and time-motion study to provide a robust analysis of the implementation environment. Outpatient clinic triage using video telehealth required significantly more time than telephone triage, indicating a reduction in nurse efficiency. Despite the increased time needed to conduct video telehealth, nurses consistently rated it useful in providing triage. Interpretive analysis of the qualitative and quantitative data suggests the increased depth and breadth of data available during video triage alters the assessment triage nurses provide physicians. This in turn could impact the time physicians spend formulating a diagnosis and treatment plan. While the immediate impact of video telehealth is a reduction in triage nurse efficiency, what is unknown is the impact of video telehealth on physician and overall clinic efficiency. Future studies should address this area. PMID:24080753

  8. Coupling between a multi-physics workflow engine and an optimization framework

    NASA Astrophysics Data System (ADS)

    Di Gallo, L.; Reux, C.; Imbeaux, F.; Artaud, J.-F.; Owsiak, M.; Saoutic, B.; Aiello, G.; Bernardi, P.; Ciraolo, G.; Bucalossi, J.; Duchateau, J.-L.; Fausser, C.; Galassi, D.; Hertout, P.; Jaboulay, J.-C.; Li-Puma, A.; Zani, L.

    2016-03-01

    A generic coupling method between a multi-physics workflow engine and an optimization framework is presented in this paper. The coupling architecture has been developed in order to preserve the integrity of the two frameworks. The objective is to provide the possibility to replace a framework, a workflow or an optimizer by another one without changing the whole coupling procedure or modifying the main content in each framework. The coupling is achieved by using a socket-based communication library for exchanging data between the two frameworks. Among a number of algorithms provided by optimization frameworks, Genetic Algorithms (GAs) have demonstrated their efficiency on single and multiple criteria optimization. Additionally to their robustness, GAs can handle non-valid data which may appear during the optimization. Consequently GAs work on most general cases. A parallelized framework has been developed to reduce the time spent for optimizations and evaluation of large samples. A test has shown a good scaling efficiency of this parallelized framework. This coupling method has been applied to the case of SYCOMORE (SYstem COde for MOdeling tokamak REactor) which is a system code developed in form of a modular workflow for designing magnetic fusion reactors. The coupling of SYCOMORE with the optimization platform URANIE enables design optimization along various figures of merit and constraints.

  9. The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

    NASA Astrophysics Data System (ADS)

    Peterson, F. S.; Barbieri, L.; Wyngaard, J.

    2015-12-01

    Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.

  10. Modelling and analysis of workflow for lean supply chains

    NASA Astrophysics Data System (ADS)

    Ma, Jinping; Wang, Kanliang; Xu, Lida

    2011-11-01

    Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.

  11. A Collaborative Workflow for the Digitization of Unique Materials

    ERIC Educational Resources Information Center

    Gueguen, Gretchen; Hanlon, Ann M.

    2009-01-01

    This paper examines the experience of one institution, the University of Maryland Libraries, as it made organizational efforts to harness existing workflows and to capture digitization done in the course of responding to patron requests. By examining the way this organization adjusted its existing workflows to put in place more systematic methods…

  12. Automatic Provenance Recording for Scientific Data using Trident

    NASA Astrophysics Data System (ADS)

    Simmhan, Y.; Barga, R.; van Ingen, C.

    2008-12-01

    Provenance is increasingly recognized as being critical to the understanding and reuse of scientific datasets. Given the rapid generation of scientific data from sensors and computational model results, it is not practical to manually record provenance for data and automated techniques for provenance capture are essential. Scientific workflows provide a framework for representing computational models and complex transformations of scientific data, and present a means for tracking the operations performed to derive a dataset. The Trident Scientific Workbench is a workflow system that natively incorporates provenance capture of data derived as part of the workflow execution. The applications used as part of a Trident workflow can execute on a remote computational cluster, such as a supercomputing center on in the Cloud, or on the local desktop of the researcher, and provenance on data derived by the applications is seamlessly captured. Scientists also have the option to annotate the provenance metadata using domain specific tags such as, for example, GCMD keywords. The provenance records thus captured can be exported in the Open Provenance Model* XML format that is emerging as a provenance standard in the eScience community or visualized as a graph of data and applications. The Trident workflow system and provenance recorded by it has been successfully applied in the Neptune oceanography project and is presently being tested in the Pan-STARRS astronomy project. *http://twiki.ipaw.info/bin/view/Challenge/OPM

  13. Enabling Efficient Climate Science Workflows in High Performance Computing Environments

    NASA Astrophysics Data System (ADS)

    Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

    2015-12-01

    A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.

  14. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    NASA Astrophysics Data System (ADS)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Ryabinkin, E.; Wenaus, T.

    2016-02-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF), is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF's Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  15. Swabs to genomes: a comprehensive workflow

    PubMed Central

    Jospin, Guillaume; Darling, Aaron E.; Coil, David A.

    2015-01-01

    The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it. PMID:26020012

  16. An integrated workflow for DNA methylation analysis.

    PubMed

    Li, Pingchuan; Demirci, Feray; Mahalingam, Gayathri; Demirci, Caghan; Nakano, Mayumi; Meyers, Blake C

    2013-05-20

    The analysis of cytosine methylation provides a new way to assess and describe epigenetic regulation at a whole-genome level in many eukaryotes. DNA methylation has a demonstrated role in the genome stability and protection, regulation of gene expression and many other aspects of genome function and maintenance. BS-seq is a relatively unbiased method for profiling the DNA methylation, with a resolution capable of measuring methylation at individual cytosines. Here we describe, as an example, a workflow to handle DNA methylation analysis, from BS-seq library preparation to the data visualization. We describe some applications for the analysis and interpretation of these data. Our laboratory provides public access to plant DNA methylation data via visualization tools available at our "Next-Gen Sequence" websites (http://mpss.udel.edu), along with small RNA, RNA-seq and other data types. PMID:23706300

  17. Software workflow for the automatic tagging of medieval manuscript images (SWATI)

    NASA Astrophysics Data System (ADS)

    Chandna, Swati; Tonne, Danah; Jejkal, Thomas; Stotzka, Rainer; Krause, Celia; Vanscheidt, Philipp; Busch, Hannah; Prabhune, Ajinkya

    2015-01-01

    Digital methods, tools and algorithms are gaining in importance for the analysis of digitized manuscript collections in the arts and humanities. One example is the BMBF-funded research project "eCodicology" which aims to design, evaluate and optimize algorithms for the automatic identification of macro- and micro-structural layout features of medieval manuscripts. The main goal of this research project is to provide better insights into high-dimensional datasets of medieval manuscripts for humanities scholars. The heterogeneous nature and size of the humanities data and the need to create a database of automatically extracted reproducible features for better statistical and visual analysis are the main challenges in designing a workflow for the arts and humanities. This paper presents a concept of a workflow for the automatic tagging of medieval manuscripts. As a starting point, the workflow uses medieval manuscripts digitized within the scope of the project Virtual Scriptorium St. Matthias". Firstly, these digitized manuscripts are ingested into a data repository. Secondly, specific algorithms are adapted or designed for the identification of macro- and micro-structural layout elements like page size, writing space, number of lines etc. And lastly, a statistical analysis and scientific evaluation of the manuscripts groups are performed. The workflow is designed generically to process large amounts of data automatically with any desired algorithm for feature extraction. As a result, a database of objectified and reproducible features is created which helps to analyze and visualize hidden relationships of around 170,000 pages. The workflow shows the potential of automatic image analysis by enabling the processing of a single page in less than a minute. Furthermore, the accuracy tests of the workflow on a small set of manuscripts with respect to features like page size and text areas show that automatic and manual analysis are comparable. The usage of a computer

  18. Traversing the many paths of workflow research: developing a conceptual framework of workflow terminology through a systematic literature review

    PubMed Central

    Novak, Laurie L; Johnson, Kevin B; Lorenzi, Nancy M

    2010-01-01

    The objective of this review was to describe methods used to study and model workflow. The authors included studies set in a variety of industries using qualitative, quantitative and mixed methods. Of the 6221 matching abstracts, 127 articles were included in the final corpus. The authors collected data from each article on researcher perspective, study type, methods type, specific methods, approaches to evaluating quality of results, definition of workflow and dependent variables. Ethnographic observation and interviews were the most frequently used methods. Long study durations revealed the large time commitment required for descriptive workflow research. The most frequently discussed technique for evaluating quality of study results was triangulation. The definition of the term “workflow” and choice of methods for studying workflow varied widely across research areas and researcher perspectives. The authors developed a conceptual framework of workflow-related terminology for use in future research and present this model for use by other researchers. PMID:20442143

  19. Comprehensive Profiling of Glycosphingolipid Glycans Using a Novel Broad Specificity Endoglycoceramidase in a High-Throughput Workflow.

    PubMed

    Albrecht, Simone; Vainauskas, Saulius; Stöckmann, Henning; McManus, Ciara; Taron, Christopher H; Rudd, Pauline M

    2016-05-01

    The biological function of glycosphingolipids (GSLs) is largely determined by their glycan headgroup moiety. This has placed a renewed emphasis on detailed GSL headgroup structural analysis. Comprehensive profiling of GSL headgroups in biological samples requires the use of endoglycoceramidases with broad substrate specificity and a robust workflow that enables their high-throughput analysis. We present here the first high-throughput glyco-analytical platform for GSL headgroup profiling. The workflow features enzymatic release of GSL glycans with a novel broad-specificity endoglycoceramidase I (EGCase I) from Rhodococcus triatomea, selective glycan capture on hydrazide beads on a robotics platform, 2AB-fluorescent glycan labeling, and analysis by UPLC-HILIC-FLD. R. triatomea EGCase I displayed a wider specificity than known EGCases and was able to efficiently hydrolyze gangliosides, globosides, (n)Lc-type GSLs, and cerebrosides. Our workflow was validated on purified GSL standard lipids and was applied to the characterization of GSLs extracted from several mammalian cell lines and human serum. This study should facilitate the analytical workflow in functional glycomics studies and biomarker discovery. PMID:27033327

  20. Improving access to space weather data via workflows and web services

    NASA Astrophysics Data System (ADS)

    Sundaravel, Anu Swapna

    The Space Physics Interactive Data Resource (SPIDR) is a web-based interactive tool developed by NOAA's National Geophysical Data Center to provide access to historical space physics datasets. These data sets are widely used by physicists for space weather modeling and predictions. Built on a distributed network of databases and application servers, SPIDR offers services in two ways: via a web page interface and via a web service interface. SPIDR exposes several SOAP-based web services that client applications implement to connect to a number of data sources for data download and processing. At present, the usage of the web services has been difficult, adding unnecessary complexity to client applications and inconvenience to the scientists who want to use these datasets. The purpose of this study focuses on improving SPIDR's web interface to better support data access, integration and display. This is accomplished in two ways: (1) examining the needs of scientists to better understand what web services they require to better access and process these datasets and (2) developing a client application to support SPIDR's SOAP-based services using the Kepler scientific workflow system. To this end, we identified, designed and developed several web services for filtering the existing datasets and created several Kepler workflows to automate routine tasks associated with these datasets. These workflows are a part of the custom NGDC build of the Kepler tool. Scientists are already familiar with Kepler due to its extensive use in this domain. As a result, this approach provides them with tools that are less daunting than raw web services and ultimately more useful and customizable. We evaluated our work by interviewing various scientists who make use of SPIDR and having them use the developed Kepler workflows while recording their feedback and suggestions. Our work has improved SPIDR such that new web services are now available and scientists have access to a desktop

  1. Opening new gateways to workflows for life scientists.

    PubMed

    Karasavvas, Konstantinos; Wolstencroft, Katy; Mina, Eleni; Cruickshank, Don; Williams, Alan; De Roure, David; Goble, Carole; Roos, Marco

    2012-01-01

    The combination of highly complex biology problems and varying IT skills among life scientists poses a unique challenge in designing bioinformatics programs. The set of tools and initiatives described in this work shows new ways of making life science workflows more accessible to the community. Our aim is to help bioinformaticians help biologists. We present how to make Taverna workflows available from within Galaxy, both widely used bioinformatics platforms. Calling Galaxy tools from Taverna is also discussed. In addition, we describe a web application that allows a user to run arbitrary Taverna workflows by only using a web browser.

  2. Distributed Workflow Service Composition Based on CTR Technology

    NASA Astrophysics Data System (ADS)

    Feng, Zhilin; Ye, Yanming

    Recently, WS-BPEL has gradually become the basis of a standard for web service description and composition. However, WS-BPEL cannot efficiently describe distributed workflow services for lacking of special expressive power and formal semantics. This paper presents a novel method for modeling distributed workflow service composition with Concurrent TRansaction logic (CTR). The syntactic structure of WS-BPEL and CTR are analyzed, and new rules of mapping WS-BPEL into CTR are given. A case study is put forward to show that the proposed method is appropriate for modeling workflow business services under distributed environments.

  3. Robust automated knowledge capture.

    SciTech Connect

    Stevens-Adams, Susan Marie; Abbott, Robert G.; Forsythe, James Chris; Trumbo, Michael Christopher Stefan; Haass, Michael Joseph; Hendrickson, Stacey M. Langfitt

    2011-10-01

    This report summarizes research conducted through the Sandia National Laboratories Robust Automated Knowledge Capture Laboratory Directed Research and Development project. The objective of this project was to advance scientific understanding of the influence of individual cognitive attributes on decision making. The project has developed a quantitative model known as RumRunner that has proven effective in predicting the propensity of an individual to shift strategies on the basis of task and experience related parameters. Three separate studies are described which have validated the basic RumRunner model. This work provides a basis for better understanding human decision making in high consequent national security applications, and in particular, the individual characteristics that underlie adaptive thinking.

  4. Context-aware workflow management of mobile health applications.

    PubMed

    Salden, Alfons; Poortinga, Remco

    2006-01-01

    We propose a medical application management architecture that allows medical (IT) experts readily designing, developing and deploying context-aware mobile health (m-health) applications or services. In particular, we elaborate on how our application workflow management architecture enables chaining, coordinating, composing, and adapting context-sensitive medical application components such that critical Quality of Service (QoS) and Quality of Context (QoC) requirements typical for m-health applications or services can be met. This functional architectural support requires learning modules for distilling application-critical selection of attention and anticipation models. These models will help medical experts constructing and adjusting on-the-fly m-health application workflows and workflow strategies. We illustrate our context-aware workflow management paradigm for a m-health data delivery problem, in which optimal communication network configurations have to be determined.

  5. Resource Tracking and Workflow System - part of the CORE system

    2009-10-02

    Resource management and workflow capability applied to engineering design situational awareness, providing the ability to make assignments and track progress through the construction and maintenance life cycle of an engineered structure.

  6. Optimization of tomographic reconstruction workflows on geographically distributed resources.

    PubMed

    Bicer, Tekin; Gürsoy, Dogˇa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T

    2016-07-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can

  7. Optimization of tomographic reconstruction workflows on geographically distributed resources.

    PubMed

    Bicer, Tekin; Gürsoy, Dogˇa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T

    2016-07-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can

  8. Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

    NASA Astrophysics Data System (ADS)

    Jones, A. S.; Horsburgh, J. S.

    2014-12-01

    Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban

  9. CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.

    2013-12-01

    As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over

  10. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

    PubMed

    Wolstencroft, Katherine; Haines, Robert; Fellows, Donal; Williams, Alan; Withers, David; Owen, Stuart; Soiland-Reyes, Stian; Dunlop, Ian; Nenadic, Aleksandra; Fisher, Paul; Bhagat, Jiten; Belhajjame, Khalid; Bacall, Finn; Hardisty, Alex; Nieva de la Hidalga, Abraham; Balcazar Vargas, Maria P; Sufi, Shoaib; Goble, Carole

    2013-07-01

    The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server. PMID:23640334

  11. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

    PubMed

    Wolstencroft, Katherine; Haines, Robert; Fellows, Donal; Williams, Alan; Withers, David; Owen, Stuart; Soiland-Reyes, Stian; Dunlop, Ian; Nenadic, Aleksandra; Fisher, Paul; Bhagat, Jiten; Belhajjame, Khalid; Bacall, Finn; Hardisty, Alex; Nieva de la Hidalga, Abraham; Balcazar Vargas, Maria P; Sufi, Shoaib; Goble, Carole

    2013-07-01

    The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

  12. An Approach to Evaluate Scientist Support in Abstract Workflows and Provenance Traces

    SciTech Connect

    Salayandia, Leonardo; Gates, Ann Q.; Pinheiro da Silva, Paulo

    2012-11-02

    Abstract workflows are useful to bridge the gap between scientists and technologists towards using computer systems to carry out scientific processes. Provenance traces provide evidence required to validate results and support their reuse. Assuming both technologies are based on formal semantics, a knowledge-based system that consistently merges both technologies is useful for scientists that produce data to document their data collecting and transformation processes; it is also useful for scientists that reuse data to assess scientific processes and resulting datasets produced by others. While evaluation of each technology is necessary for a given application, this work discusses their combined evaluation. The claim is that both technologies should complement each other and align consistently to a scientist’s perspective in order to be effective for science. Evaluation criteria are proposed based on lessons learned and exemplified for discussion.

  13. Scientific Communication.

    ERIC Educational Resources Information Center

    Abelson, Philip H.

    1980-01-01

    The value of communication in the preservation of scientific knowledge is described as it relates to the specialized scientific journals. The discipline of peer review is described as the major factor in keeping the scientific enterprise relatively honest. (SA)

  14. The CESM Workflow Re-Engineering Project

    NASA Astrophysics Data System (ADS)

    Strand, G.

    2015-12-01

    The Community Earth System Model (CESM) Workflow Re-Engineering Project is a collaborative project between the CESM Software Engineering Group (CSEG) and the NCAR Computation and Information Systems Lab (CISL) Application Scalability and Performance (ASAP) Group to revamp how CESM saves its output. The CMIP3 and particularly CMIP5 experiences in submitting CESM data to those intercomparison projects revealed that the output format of the CESM is not well-suited for the data requirements common to model intercomparison projects. CESM, for efficiency reasons, creates output files containing all fields for each model time sampling, but MIPs require individual files for each field comprising all model time samples. This transposition of model output can be very time-consuming; depending on the volume of data written by the specific simulation, the time to re-orient the data can be comparable to the time required for the simulation to complete. Previous strategies including using serial tools to perform this transposition, but they are now far too inefficient to deal with the many terabytes of output a single simulation can generate. A new set of Python tools, using data parallelism, have been written to enable this re-orientation, and have achieved markedly improved I/O performance. The perspective of a data manager/data producer in the use of these new tools is presented, and likely future work on their development and use will be shown. These tools are a critical part of the NCAR CESM submission to the upcoming CMIP6, with the intention that a much more timely and efficient submission of the expected petabytes of data will be accomplished in the given time frame.

  15. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

    PubMed Central

    Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta

    2015-01-01

    Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http

  16. Scientific Data Management Center for Enabling Technologies

    SciTech Connect

    Vouk, Mladen A.

    2013-01-15

    Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Fortunately, the data management problems encountered by most scientific domains are common enough to be addressed through shared technology solutions. Based on community input, we have identified three significant requirements. First, more efficient access to storage systems is needed. In particular, parallel file system and I/O system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. These processes are complicated by the fact that scientific data are structured differently for specific application domains, and are stored in specialized file formats. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over extremely large data sets. Specialized feature discovery and statistical analysis techniques are needed before the data can be understood or visualized. Furthermore, interactive analysis requires techniques for efficiently selecting subsets of the data. Finally, generating the data, collecting and storing the results, keeping track of data provenance, data post-processing, and analysis of results is a tedious, fragmented process. Tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration. The SDM center was established under the SciDAC program to address these issues. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of advanced

  17. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts.

    PubMed

    Laffy, Patrick W; Wood-Charlson, Elisha M; Turaev, Dmitrij; Weynberg, Karen D; Botté, Emmanuelle S; van Oppen, Madeleine J H; Webster, Nicole S; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  18. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

    PubMed Central

    Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  19. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

    PubMed Central

    Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

    2016-01-01

    Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships. PMID:27672517

  20. Integration of Earth System Models and Workflow Management under iRODS for the Northeast Regional Earth System Modeling Project

    NASA Astrophysics Data System (ADS)

    Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.

    2012-12-01

    The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.

  1. A workflow to preserve genome-quality tissue samples from plants in botanical gardens and arboreta1

    PubMed Central

    Gostel, Morgan R.; Kelloff, Carol; Wallick, Kyle; Funk, Vicki A.

    2016-01-01

    Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Workflows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate for plants. We outline a workflow for tissue sampling intended for two audiences: botanists interested in genomics research and garden staff who plan to voucher living collections. Methods and Results: Standard herbarium methods are used to collect vouchers, label information and images are entered into a publicly accessible database, and leaf tissue is preserved in silica and liquid nitrogen. A five-step approach for genomic tissue sampling is presented for sampling from living collections according to current best practices. Conclusions: Collecting genome-quality samples from gardens is an economical and rapid way to make available for scientific research tissue from the diversity of plants on Earth. The Global Genome Initiative will facilitate and lead this endeavor through international partnerships.

  2. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

    PubMed Central

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

    2015-01-01

    Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831

  3. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  4. Experiences with workflows for automating data-intensive bioinformatics.

    PubMed

    Spjuth, Ola; Bongcam-Rudloff, Erik; Hernández, Guillermo Carrasco; Forer, Lukas; Giovacchini, Mario; Guimera, Roman Valls; Kallio, Aleksi; Korpelainen, Eija; Kańduła, Maciej M; Krachunov, Milko; Kreil, David P; Kulev, Ognyan; Łabaj, Paweł P; Lampa, Samuel; Pireddu, Luca; Schönherr, Sebastian; Siretskiy, Alexey; Vassilev, Dimitar

    2015-01-01

    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution. PMID:26282399

  5. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  6. CamBAfx: Workflow Design, Implementation and Application for Neuroimaging.

    PubMed

    Ooi, Cinly; Bullmore, Edward T; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

    2009-01-01

    CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs.

  7. Development of the workflow kine systems for support on KAIZEN.

    PubMed

    Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro

    2012-01-01

    In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.

  8. CamBAfx: Workflow Design, Implementation and Application for Neuroimaging

    PubMed Central

    Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

    2009-01-01

    CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470

  9. Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

    NASA Astrophysics Data System (ADS)

    Nguyen, P. T.; Chapman, D. R.; Halem, M.

    2012-12-01

    New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in

  10. The workflow of single-cell expression profiling using quantitative real-time PCR

    PubMed Central

    Ståhlberg, Anders; Kubista, Mikael

    2014-01-01

    Biological material is heterogeneous and when exposed to stimuli the various cells present respond differently. Much of the complexity can be eliminated by disintegrating the sample, studying the cells one by one. Single-cell profiling reveals responses that go unnoticed when classical samples are studied. New cell types and cell subtypes may be found and relevant pathways and expression networks can be identified. The most powerful technique for single-cell expression profiling is currently quantitative reverse transcription real-time PCR (RT-qPCR). A robust RT-qPCR workflow for highly sensitive and specific measurements in high-throughput and a reasonable degree of multiplexing has been developed for targeting mRNAs, but also microRNAs, non-coding RNAs and most recently also proteins. We review the current state of the art of single-cell expression profiling and present also the improvements and developments expected in the next 5 years. PMID:24649819

  11. ESO Reflex: A Graphical Workflow Engine for Astronomical Data Reduction

    NASA Astrophysics Data System (ADS)

    Hook, Richard; Romaniello, Martino; Ullgrén, Marko; Maisala, Sami; Solin, Otto; Oittinen, Tero; Savolainen, Villa; Järveläinen, Pekka; Tyynelä, Jani; Péron, Michèle; Izzo, Carlo; Ballester, Pascal; Gabasch, Armin

    2008-03-01

    ESO Reflex is a software tool that provides a novel approach to astronomical data reduction. The reduction sequence is rendered and controlled as a graphical workflow. Users can follow and interact with the processing in an intuitive manner, without the need for complex scripting. The graphical interface also allows the modification of existing workflows and the creation of new ones. ESO Reflex can invoke standard ESO data reduction recipes in a flexible way. Python scripts, IDL procedures and shell commands can also be easily brought into workflows and a variety of visualisation and display options, including custom product inspection and validation steps, are available. ESO Reflex was developed in the context of the Sampo project, a three-year effort led by ESO and conducted by a software development team from Finland as an in-kind contribution to joining ESO. It is planned that the software will be released to the community in late 2008.

  12. OWL: A Condor Based Workflow Management System for JWST

    NASA Astrophysics Data System (ADS)

    Pierfederici, F.; Swam, M.; Greene, G.; Kyprianou, M.; Gaffney, N.

    2012-09-01

    The Open Workflow Layer (OWL) is an open source Workflow Management System (WMS) developed at the Space Telescope Science Institute. OWL is being designed for the James Webb Space Telescope (JWST) science data processing using the Hubble Space Telescope (HST) as a test bed. It is however very general and could be applied to many other missions and data processing applications. OWL is a thin Python layer that provides advanced workflow management, GUIs and a data-centric view on top of Condor, a widely used open source batch scheduling system. As such, OWL can transparently take advantage of the many features offered by Condor without having to re-implement them from scratch.

  13. Building an efficient curation workflow for the Arabidopsis literature corpus.

    PubMed

    Li, Donghui; Berardini, Tanya Z; Muller, Robert J; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298

  14. A Workflow for Studying Specialized Metabolism in Nonmodel Eukaryotic Organisms.

    PubMed

    Torrens-Spence, M P; Fallon, T R; Weng, J K

    2016-01-01

    Eukaryotes contain a diverse tapestry of specialized metabolites, many of which are of significant pharmaceutical and industrial importance to humans. Nevertheless, exploration of specialized metabolic pathways underlying specific chemical traits in nonmodel eukaryotic organisms has been technically challenging and historically lagged behind that of the bacterial systems. Recent advances in genomics, metabolomics, phylogenomics, and synthetic biology now enable a new workflow for interrogating unknown specialized metabolic systems in nonmodel eukaryotic hosts with greater efficiency and mechanistic depth. This chapter delineates such workflow by providing a collection of state-of-the-art approaches and tools, ranging from multiomics-guided candidate gene identification to in vitro and in vivo functional and structural characterization of specialized metabolic enzymes. As already demonstrated by several recent studies, this new workflow opens up a gateway into the largely untapped world of natural product biochemistry in eukaryotes. PMID:27480683

  15. Nexus: a modular workflow management system for quantum simulation codes

    DOE PAGESBeta

    Krogel, Jaron T.

    2015-08-24

    The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less

  16. Task Delegation Based Access Control Models for Workflow Systems

    NASA Astrophysics Data System (ADS)

    Gaaloul, Khaled; Charoy, François

    e-Government organisations are facilitated and conducted using workflow management systems. Role-based access control (RBAC) is recognised as an efficient access control model for large organisations. The application of RBAC in workflow systems cannot, however, grant permissions to users dynamically while business processes are being executed. We currently observe a move away from predefined strict workflow modelling towards approaches supporting flexibility on the organisational level. One specific approach is that of task delegation. Task delegation is a mechanism that supports organisational flexibility, and ensures delegation of authority in access control systems. In this paper, we propose a Task-oriented Access Control (TAC) model based on RBAC to address these requirements. We aim to reason about task from organisational perspectives and resources perspectives to analyse and specify authorisation constraints. Moreover, we present a fine grained access control protocol to support delegation based on the TAC model.

  17. Nexus: a modular workflow management system for quantum simulation codes

    SciTech Connect

    Krogel, Jaron T.

    2015-08-24

    The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.

  18. Nexus: A modular workflow management system for quantum simulation codes

    NASA Astrophysics Data System (ADS)

    Krogel, Jaron T.

    2016-01-01

    The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.

  19. A Comprehensive Workflow of Mass Spectrometry-Based Untargeted Metabolomics in Cancer Metabolic Biomarker Discovery Using Human Plasma and Urine

    PubMed Central

    Zou, Wei; She, Jianwen; Tolstikov, Vladimir V.

    2013-01-01

    Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC–LC), reversed-phase liquid chromatography (RP–LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow. PMID:24958150

  20. Data processing workflows from low-cost digital survey to various applications: three case studies of Chinese historic architecture

    NASA Astrophysics Data System (ADS)

    Sun, Z.; Cao, Y. K.

    2015-08-01

    The paper focuses on the versatility of data processing workflows ranging from BIM-based survey to structural analysis and reverse modeling. In China nowadays, a large number of historic architecture are in need of restoration, reinforcement and renovation. But the architects are not prepared for the conversion from the booming AEC industry to architectural preservation. As surveyors working with architects in such projects, we have to develop efficient low-cost digital survey workflow robust to various types of architecture, and to process the captured data for architects. Although laser scanning yields high accuracy in architectural heritage documentation and the workflow is quite straightforward, the cost and portability hinder it from being used in projects where budget and efficiency are of prime concern. We integrate Structure from Motion techniques with UAV and total station in data acquisition. The captured data is processed for various purposes illustrated with three case studies: the first one is as-built BIM for a historic building based on registered point clouds according to Ground Control Points; The second one concerns structural analysis for a damaged bridge using Finite Element Analysis software; The last one relates to parametric automated feature extraction from captured point clouds for reverse modeling and fabrication.

  1. Ontology-Driven Discovery of Scientific Computational Entities

    ERIC Educational Resources Information Center

    Brazier, Pearl W.

    2010-01-01

    Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…

  2. From shared data to sharing workflow: merging PACS and teleradiology.

    PubMed

    Benjamin, Menashe; Aradi, Yinon; Shreiber, Reuven

    2010-01-01

    Due to a host of technological, interface, operational and workflow limitations, teleradiology and PACS/RIS were historically developed as separate systems serving different purposes. PACS/RIS handled local radiology storage and workflow management while teleradiology addressed remote access to images. Today advanced PACS/RIS support complete site radiology workflow for attending physicians, whether on-site or remote. In parallel, teleradiology has emerged into a service of providing remote, off-hours, coverage for emergency radiology and to a lesser extent subspecialty reading to subscribing sites and radiology groups. When attending radiologists use teleradiology for remote access to a site, they may share all relevant patient data and participate in the site's workflow like their on-site peers. The operation gets cumbersome and time consuming when these radiologists serve multi-sites, each requiring a different remote access, or when the sites do not employ the same PACS/RIS/Reporting Systems and do not share the same ownership. The least efficient operation is of teleradiology companies engaged in reading for multiple facilities. As these services typically employ non-local radiologists, they are allowed to share some of the available patient data necessary to provide an emergency report but, by enlarge, they do not share the workflow of the sites they serve. Radiology stakeholders usually prefer to have their own radiologists perform all radiology tasks including interpretation of off-hour examinations. It is possible with current technology to create a system that combines the benefits of local radiology services to multiple sites with the advantages offered by adding subspecialty and off-hours emergency services through teleradiology. Such a system increases efficiency for the radiology groups by enabling all users, regardless of location, to work "local" and fully participate in the workflow of every site. We refer to such a system as SuperPACS. PMID

  3. Fast robust correlation.

    PubMed

    Fitch, Alistair J; Kadyrov, Alexander; Christmas, William J; Kittler, Josef

    2005-08-01

    A new, fast, statistically robust, exhaustive, translational image-matching technique is presented: fast robust correlation. Existing methods are either slow or non-robust, or rely on optimization. Fast robust correlation works by expressing a robust matching surface as a series of correlations. Speed is obtained by computing correlations in the frequency domain. Computational cost is analyzed and the method is shown to be fast. Speed is comparable to conventional correlation and, for large images, thousands of times faster than direct robust matching. Three experiments demonstrate the advantage of the technique over standard correlation.

  4. Linking Geobiology Fieldwork and Data Curation Through Workflow Documentation

    NASA Astrophysics Data System (ADS)

    Thomer, A.; Baker, K. S.; Jett, J. G.; Gordon, S.; Palmer, C. L.

    2014-12-01

    Describing the specific processes and artifacts that lead to the creation of data products provides a detailed picture of data provenance in the form of a high-level workflow. The resulting diagram identifies:1. "points of intervention" at which curation processes can be moved upstream, and 2. data products that may be important for sharing and preservation. The Site-Based Data Curation project, an Institute of Museum and Library Services-funded project hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, previously inferred a geobiologist's planning, field and laboratory workflows through close study of the data products produced during a single field trip to Yellowstone National Park (Wickett et al, 2013). We have since built on this work by documenting post hoc curation processes, and integrating them with the existing workflow. By holistically considering both data collection and curation, we are able to identify concrete steps that scientists can take to begin curating data in the field. This field-to-repository workflow represents a first step toward a more comprehensive and nuanced model of the research data lifecycle. Using our initial three-phase workflow, we identify key data products to prioritize for curation, and the points at which data curation best practices integrate with research processes with minimal interruption. We then document the processes that make key data products sharable and ready for preservation. We append the resulting curatorial phases to the field data collection workflow: Data Staging, Data Standardizing and Data Packaging. These refinements demonstrate:1) the interdependence of research and curatorial phases;2) the links between specific research products, research phases and curatorial processes; 3) the interdependence of laboratory-specific standards and community-wide best practices. We propose a poster that shows the six-phase workflow described above. We plan to discuss

  5. Flexible Early Warning Systems with Workflows and Decision Tables

    NASA Astrophysics Data System (ADS)

    Riedel, F.; Chaves, F.; Zeiner, H.

    2012-04-01

    An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows

  6. Workflow modeling in the graphic arts and printing industry

    NASA Astrophysics Data System (ADS)

    Tuijn, Chris

    2003-12-01

    The last few years, a lot of effort has been spent on the standardization of the workflow in the graphic arts and printing industry. The main reasons for this standardization are two-fold: first of all, the need to represent all aspects of products, processes and resources in a uniform, digital framework and, secondly, the need to have different systems communicate with each other without having to implement dedicated drivers or protocols. Since many years, a number of organizations in the IT sector have been quite busy developing models and languages on the topic of workflow modeling. In addition to the more formal methods (such as, e.g., extended finite state machines, Petri Nets, Markov Chains etc.) introduced a number of decades ago, more pragmatic methods have been proposed quite recently. We hereby think in particular of the activities of the Workflow Management Coalition that resulted in an XML based Process Definition Language. Although one might be tempted to use the already established standards in the graphic environment, one should be well aware of the complexity and uniqueness of the graphic arts workflow. In this paper, we will show that it is quite hard though not impossible to model the graphic arts workflow using the already established workflow systems. After a brief summary of the graphic arts workflow requirements, we will show why the traditional models are less suitable to use. It will turn out that one of the main reasons for the incompatibility is that the graphic arts workflow is primarily resource driven; this means that the activation of processes depends on the status of different incoming resources. The fact that processes can start running with a partial availability of the input resources is a further complication that asks for additional knowledge on process level. In the second part of this paper, we will discuss in more detail the different software components that are available in any graphic enterprise. In the last part, we will

  7. Contextual cloud-based service oriented architecture for clinical workflow.

    PubMed

    Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

    2015-01-01

    Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.

  8. Flexible End2End Workflow Automation of Hit-Discovery Research.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Thurow, Kerstin

    2014-08-01

    The article considers a new approach of more complex laboratory automation at the workflow layer. The authors purpose the automation of end2end workflows. The combination of all relevant subprocesses-whether automated or manually performed, independently, and in which organizational unit-results in end2end processes that include all result dependencies. The end2end approach focuses on not only the classical experiments in synthesis or screening, but also on auxiliary processes such as the production and storage of chemicals, cell culturing, and maintenance as well as preparatory activities and analyses of experiments. Furthermore, the connection of control flow and data flow in the same process model leads to reducing of effort of the data transfer between the involved systems, including the necessary data transformations. This end2end laboratory automation can be realized effectively with the modern methods of business process management (BPM). This approach is based on a new standardization of the process-modeling notation Business Process Model and Notation 2.0. In drug discovery, several scientific disciplines act together with manifold modern methods, technologies, and a wide range of automated instruments for the discovery and design of target-based drugs. The article discusses the novel BPM-based automation concept with an implemented example of a high-throughput screening of previously synthesized compound libraries.

  9. Gocad2OGS: Workflow to Integrate Geo-structural Information into Numerical Simulation Models

    NASA Astrophysics Data System (ADS)

    Fischer, Thomas; Walther, Marc; Naumov, Dmitri; Sattler, Sabine; Kolditz, Olaf

    2015-04-01

    The investigation of fluid circulation in the Thuringian syncline is one of the INFLUINS project's targets. A 3D geo-structural model including 12 stratigraphic layers and 54 fault zones is created by geologists in the first step using the Gocad software. Within the INFLUINS project a ground-water flow simulation is used to check existing hypotheses and to gain new ideas of the underground fluid flow behaviour. We used the scientific, platform independent, open source software OpenGeoSys that implements the finite element method to solve the governing equations describing fluid flow in porous media. The geo-structural Gocad model is not suitable for the FEM numerical analysis. Therefore it is converted into an unstructured grid satisfying all mesh quality criteria required for the ground-water flow simulation. The resulting grid is stored in an open data format given by the Visualization Toolkit (vtk). In this work we present a workflow to convert geological structural models, created using the Gocad software, into a simulation model that is easy to use from numerical simulation software. We tested our workflow with the 3D geo-structural model of the Thuringian syncline and were able to setup and to evaluate a hydrogeological simulation model successfully.

  10. A workflow for the 3D visualization of meteorological data

    NASA Astrophysics Data System (ADS)

    Helbig, Carolin; Rink, Karsten

    2014-05-01

    In the future, climate change will strongly influence our environment and living conditions. To predict possible changes, climate models that include basic and process conditions have been developed and big data sets are produced as a result of simulations. The combination of various variables of climate models with spatial data from different sources helps to identify correlations and to study key processes. For our case study we use results of the weather research and forecasting (WRF) model of two regions at different scales that include various landscapes in Northern Central Europe and Baden-Württemberg. We visualize these simulation results in combination with observation data and geographic data, such as river networks, to evaluate processes and analyze if the model represents the atmospheric system sufficiently. For this purpose, a continuous workflow that leads from the integration of heterogeneous raw data to visualization using open source software (e.g. OpenGeoSys Data Explorer, ParaView) is developed. These visualizations can be displayed on a desktop computer or in an interactive virtual reality environment. We established a concept that includes recommended 3D representations and a color scheme for the variables of the data based on existing guidelines and established traditions in the specific domain. To examine changes over time in observation and simulation data, we added the temporal dimension to the visualization. In a first step of the analysis, the visualizations are used to get an overview of the data and detect areas of interest such as regions of convection or wind turbulences. Then, subsets of data sets are extracted and the included variables can be examined in detail. An evaluation by experts from the domains of visualization and atmospheric sciences establish if they are self-explanatory and clearly arranged. These easy-to-understand visualizations of complex data sets are the basis for scientific communication. In addition, they have

  11. Robustness: confronting lessons from physics and biology.

    PubMed

    Lesne, Annick

    2008-11-01

    The term robustness is encountered in very different scientific fields, from engineering and control theory to dynamical systems to biology. The main question addressed herein is whether the notion of robustness and its correlates (stability, resilience, self-organisation) developed in physics are relevant to biology, or whether specific extensions and novel frameworks are required to account for the robustness properties of living systems. To clarify this issue, the different meanings covered by this unique term are discussed; it is argued that they crucially depend on the kind of perturbations that a robust system should by definition withstand. Possible mechanisms underlying robust behaviours are examined, either encountered in all natural systems (symmetries, conservation laws, dynamic stability) or specific to biological systems (feedbacks and regulatory networks). Special attention is devoted to the (sometimes counterintuitive) interrelations between robustness and noise. A distinction between dynamic selection and natural selection in the establishment of a robust behaviour is underlined. It is finally argued that nested notions of robustness, relevant to different time scales and different levels of organisation, allow one to reconcile the seemingly contradictory requirements for robustness and adaptability in living systems. PMID:18823391

  12. Approximate truncation robust computed tomography—ATRACT

    NASA Astrophysics Data System (ADS)

    Dennerlein, Frank; Maier, Andreas

    2013-09-01

    We present an approximate truncation robust algorithm to compute tomographic images (ATRACT). This algorithm targets at reconstructing volumetric images from cone-beam projections in scenarios where these projections are highly truncated in each dimension. It thus facilitates reconstructions of small subvolumes of interest, without involving prior knowledge about the object. Our method is readily applicable to medical C-arm imaging, where it may contribute to new clinical workflows together with a considerable reduction of x-ray dose. We give a detailed derivation of ATRACT that starts from the conventional Feldkamp filtered-backprojection algorithm and that involves, as one component, a novel original formula for the inversion of the two-dimensional Radon transform. Discretization and numerical implementation are discussed and reconstruction results from both, simulated projections and first clinical data sets are presented.

  13. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.

    2008-12-01

    NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a

  14. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.

    2009-04-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the Viz

  15. Images crossing borders: image and workflow sharing on multiple levels.

    PubMed

    Ross, Peeter; Pohjonen, Hanna

    2011-04-01

    Digitalisation of medical data makes it possible to share images and workflows between related parties. In addition to linear data flow where healthcare professionals or patients are the information carriers, a new type of matrix of many-to-many connections is emerging. Implementation of shared workflow brings challenges of interoperability and legal clarity. Sharing images or workflows can be implemented on different levels with different challenges: inside the organisation, between organisations, across country borders, or between healthcare institutions and citizens. Interoperability issues vary according to the level of sharing and are either technical or semantic, including language. Legal uncertainty increases when crossing national borders. Teleradiology is regulated by multiple European Union (EU) directives and legal documents, which makes interpretation of the legal system complex. To achieve wider use of eHealth and teleradiology several strategic documents were published recently by the EU. Despite EU activities, responsibility for organising, providing and funding healthcare systems remains with the Member States. Therefore, the implementation of new solutions requires strong co-operation between radiologists, societies of radiology, healthcare administrators, politicians and relevant EU authorities. The aim of this article is to describe different dimensions of image and workflow sharing and to analyse legal acts concerning teleradiology in the EU.

  16. Images crossing borders: image and workflow sharing on multiple levels.

    PubMed

    Ross, Peeter; Pohjonen, Hanna

    2011-04-01

    Digitalisation of medical data makes it possible to share images and workflows between related parties. In addition to linear data flow where healthcare professionals or patients are the information carriers, a new type of matrix of many-to-many connections is emerging. Implementation of shared workflow brings challenges of interoperability and legal clarity. Sharing images or workflows can be implemented on different levels with different challenges: inside the organisation, between organisations, across country borders, or between healthcare institutions and citizens. Interoperability issues vary according to the level of sharing and are either technical or semantic, including language. Legal uncertainty increases when crossing national borders. Teleradiology is regulated by multiple European Union (EU) directives and legal documents, which makes interpretation of the legal system complex. To achieve wider use of eHealth and teleradiology several strategic documents were published recently by the EU. Despite EU activities, responsibility for organising, providing and funding healthcare systems remains with the Member States. Therefore, the implementation of new solutions requires strong co-operation between radiologists, societies of radiology, healthcare administrators, politicians and relevant EU authorities. The aim of this article is to describe different dimensions of image and workflow sharing and to analyse legal acts concerning teleradiology in the EU. PMID:22347943

  17. Content and Workflow Management for Library Websites: Case Studies

    ERIC Educational Resources Information Center

    Yu, Holly, Ed.

    2005-01-01

    Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…

  18. Electronic Health Record-Driven Workflow for Diagnostic Radiologists.

    PubMed

    Geeslin, Matthew G; Gaskin, Cree M

    2016-01-01

    In most settings, radiologists maintain a high-throughput practice in which efficiency is crucial. The conversion from film-based to digital study interpretation and data storage launched the era of PACS-driven workflow, leading to significant gains in speed. The advent of electronic health records improved radiologists' access to patient data; however, many still find this aspect of workflow to be relatively cumbersome. Nevertheless, the ability to guide a diagnostic interpretation with clinical information, beyond that provided in the examination indication, can add significantly to the specificity of a radiologist's interpretation. Responsibilities of the radiologist include, but are not limited to, protocoling examinations, interpreting studies, chart review, peer review, writing notes, placing orders, and communicating with referring providers. Most of the aforementioned activities are not PACS-centric and require a login to one or more additional applications. Consolidation of these tasks for completion through a single interface can simplify workflow, save time, and potentially reduce the incidence of errors. Here, the authors describe diagnostic radiology workflow that leverages the electronic health record to significantly add to a radiologist's ability to be part of the health care team, provide relevant interpretations, and improve efficiency and quality. PMID:26603098

  19. Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

    NASA Astrophysics Data System (ADS)

    Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A.; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W.

    2016-04-01

    Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

  20. Scientific Misconduct.

    PubMed

    Gross, Charles

    2016-01-01

    Scientific misconduct has been defined as fabrication, falsification, and plagiarism. Scientific misconduct has occurred throughout the history of science. The US government began to take systematic interest in such misconduct in the 1980s. Since then, a number of studies have examined how frequently individual scientists have observed scientific misconduct or were involved in it. Although the studies vary considerably in their methodology and in the nature and size of their samples, in most studies at least 10% of the scientists sampled reported having observed scientific misconduct. In addition to studies of the incidence of scientific misconduct, this review considers the recent increase in paper retractions, the role of social media in scientific ethics, several instructional examples of egregious scientific misconduct, and potential methods to reduce research misconduct. PMID:26273897

  1. Scientific Misconduct.

    PubMed

    Gross, Charles

    2016-01-01

    Scientific misconduct has been defined as fabrication, falsification, and plagiarism. Scientific misconduct has occurred throughout the history of science. The US government began to take systematic interest in such misconduct in the 1980s. Since then, a number of studies have examined how frequently individual scientists have observed scientific misconduct or were involved in it. Although the studies vary considerably in their methodology and in the nature and size of their samples, in most studies at least 10% of the scientists sampled reported having observed scientific misconduct. In addition to studies of the incidence of scientific misconduct, this review considers the recent increase in paper retractions, the role of social media in scientific ethics, several instructional examples of egregious scientific misconduct, and potential methods to reduce research misconduct.

  2. Emergency Medicine Resident Physicians’ Perceptions of Electronic Documentation and Workflow

    PubMed Central

    Neri, P.M.; Redden, L.; Poole, S.; Pozner, C.N.; Horsky, J.; Raja, A.S.; Poon, E.; Schiff, G.

    2015-01-01

    Summary Objective To understand emergency department (ED) physicians’ use of electronic documentation in order to identify usability and workflow considerations for the design of future ED information system (EDIS) physician documentation modules. Methods We invited emergency medicine resident physicians to participate in a mixed methods study using task analysis and qualitative interviews. Participants completed a simulated, standardized patient encounter in a medical simulation center while documenting in the test environment of a currently used EDIS. We recorded the time on task, type and sequence of tasks performed by the participants (including tasks performed in parallel). We then conducted semi-structured interviews with each participant. We analyzed these qualitative data using the constant comparative method to generate themes. Results Eight resident physicians participated. The simulation session averaged 17 minutes and participants spent 11 minutes on average on tasks that included electronic documentation. Participants performed tasks in parallel, such as history taking and electronic documentation. Five of the 8 participants performed a similar workflow sequence during the first part of the session while the remaining three used different workflows. Three themes characterize electronic documentation: (1) physicians report that location and timing of documentation varies based on patient acuity and workload, (2) physicians report a need for features that support improved efficiency; and (3) physicians like viewing available patient data but struggle with integration of the EDIS with other information sources. Conclusion We confirmed that physicians spend much of their time on documentation (65%) during an ED patient visit. Further, we found that resident physicians did not all use the same workflow and approach even when presented with an identical standardized patient scenario. Future EHR design should consider these varied workflows while trying to

  3. Research Electronic Data Capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support

    PubMed Central

    Harris, Paul A.; Taylor, Robert; Thielke, Robert; Payne, Jonathon; Gonzalez, Nathaniel; Conde, Jose G.

    2009-01-01

    REDCap is a novel workflow methodology and software solution designed for rapid development and deployment of electronic data capture tools to support clinical and translational research. We present: 1) a brief description of the REDCap metadata-driven software toolset; 2) detail concerning the capture and use of study-related metadata from scientific research teams; 3) measures of impact for REDCap; 4) details concerning a consortium network of domestic and international institutions collaborating on the project; and 5) strengths and limitations of the REDCap system. REDCap is currently supporting 286 translational research projects in a growing collaborative network including 27 active partner institutions. PMID:18929686

  4. Workflow analysis and evidence-based medicine: towards integration of knowledge-based functions in hospital information systems.

    PubMed Central

    Mueller, M. L.; Ganslandt, T.; Frankewitsch, T.; Krieglstein, C. F.; Senninger, N.; Prokosch, H. U.

    1999-01-01

    The large extent and complexity of scientific evidence described in the concept of evidence-based medicine often overwhelms clinicians who want to apply best external evidence. Hospital Information Systems usually do not provide knowledge-based functions to support context-sensitive linking to external information sources. Knowledge-based components need specific data, which must be entered manually and should be well adapted to clinical environment to be accepted by clinicians. This paper describes a workflow-based approach to understand and visualize clinical reality as a preliminary to designing software applications, and possible starting points for further software development. PMID:10566375

  5. A Novel Automated High-Content Analysis Workflow Capturing Cell Population Dynamics from Induced Pluripotent Stem Cell Live Imaging Data

    PubMed Central

    Kerz, Maximilian; Folarin, Amos; Meleckyte, Ruta; Watt, Fiona M.; Dobson, Richard J.; Danovi, Davide

    2016-01-01

    Most image analysis pipelines rely on multiple channels per image with subcellular reference points for cell segmentation. Single-channel phase-contrast images are often problematic, especially for cells with unfavorable morphology, such as induced pluripotent stem cells (iPSCs). Live imaging poses a further challenge, because of the introduction of the dimension of time. Evaluations cannot be easily integrated with other biological data sets including analysis of endpoint images. Here, we present a workflow that incorporates a novel CellProfiler-based image analysis pipeline enabling segmentation of single-channel images with a robust R-based software solution to reduce the dimension of time to a single data point. These two packages combined allow robust segmentation of iPSCs solely on phase-contrast single-channel images and enable live imaging data to be easily integrated to endpoint data sets while retaining the dynamics of cellular responses. The described workflow facilitates characterization of the response of live-imaged iPSCs to external stimuli and definition of cell line–specific, phenotypic signatures. We present an efficient tool set for automated high-content analysis suitable for cells with challenging morphology. This approach has potentially widespread applications for human pluripotent stem cells and other cell types. PMID:27256155

  6. A Novel Automated High-Content Analysis Workflow Capturing Cell Population Dynamics from Induced Pluripotent Stem Cell Live Imaging Data.

    PubMed

    Kerz, Maximilian; Folarin, Amos; Meleckyte, Ruta; Watt, Fiona M; Dobson, Richard J; Danovi, Davide

    2016-10-01

    Most image analysis pipelines rely on multiple channels per image with subcellular reference points for cell segmentation. Single-channel phase-contrast images are often problematic, especially for cells with unfavorable morphology, such as induced pluripotent stem cells (iPSCs). Live imaging poses a further challenge, because of the introduction of the dimension of time. Evaluations cannot be easily integrated with other biological data sets including analysis of endpoint images. Here, we present a workflow that incorporates a novel CellProfiler-based image analysis pipeline enabling segmentation of single-channel images with a robust R-based software solution to reduce the dimension of time to a single data point. These two packages combined allow robust segmentation of iPSCs solely on phase-contrast single-channel images and enable live imaging data to be easily integrated to endpoint data sets while retaining the dynamics of cellular responses. The described workflow facilitates characterization of the response of live-imaged iPSCs to external stimuli and definition of cell line-specific, phenotypic signatures. We present an efficient tool set for automated high-content analysis suitable for cells with challenging morphology. This approach has potentially widespread applications for human pluripotent stem cells and other cell types.

  7. WARP (workflow for automated and rapid production): a framework for end-to-end automated digital print workflows

    NASA Astrophysics Data System (ADS)

    Joshi, Parag

    2006-02-01

    Publishing industry is experiencing a major paradigm shift with the advent of digital publishing technologies. A large number of components in the publishing and print production workflow are transformed in this shift. However, the process as a whole requires a great deal of human intervention for decision making and for resolving exceptions during job execution. Furthermore, a majority of the best-of-breed applications for publishing and print production are intrinsically designed and developed to be driven by humans. Thus, the human-intensive nature of the current prepress process accounts for a very significant amount of the overhead costs in fulfillment of jobs on press. It is a challenge to automate the functionality of applications built with the model of human driven exectution. Another challenge is to orchestrate various components in the publishing and print production pipeline such that they work in a seamless manner to enable the system to perform automatic detection of potential failures and take corrective actions in a proactive manner. Thus, there is a great need for a coherent and unifying workflow architecture that streamlines the process and automates it as a whole in order to create an end-to-end digital automated print production workflow that does not involve any human intervention. This paper describes an architecture and building blocks that lay the foundation for a plurality of automated print production workflows.

  8. Replication and Robustness in Developmental Research

    ERIC Educational Resources Information Center

    Duncan, Greg J.; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J.

    2014-01-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key…

  9. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook

    PubMed Central

    Stevens, Jean-Luc R.; Elver, Marco; Bednar, James A.

    2013-01-01

    Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change. PMID:24416014

  10. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook.

    PubMed

    Stevens, Jean-Luc R; Elver, Marco; Bednar, James A

    2013-01-01

    Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change. PMID:24416014

  11. Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

    NASA Astrophysics Data System (ADS)

    Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

    2007-12-01

    Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access

  12. Improved workflows for high throughput library preparation using the transposome-based nextera system

    PubMed Central

    2013-01-01

    Background The Nextera protocol, which utilises a transposome based approach to create libraries for Illumina sequencing, requires pure DNA template, an accurate assessment of input concentration and a column clean-up that limits its applicability for high-throughput sample preparation. We addressed the identified limitations to develop a robust workflow that supports both rapid and high-throughput projects also reducing reagent costs. Results We show that an initial bead-based normalisation step can remove the need for quantification and improves sample purity. A 75% cost reduction was achieved with a low-volume modified protocol which was tested over genomes with different GC content to demonstrate its robustness. Finally we developed a custom set of index tags and primers which increase the number of samples that can simultaneously be sequenced on a single lane of an Illumina instrument. Conclusions We addressed the bottlenecks of Nextera library construction to produce a modified protocol which harnesses the full power of the Nextera kit and allows the reproducible construction of libraries on a high-throughput scale reducing the associated cost of the kit. PMID:24256843

  13. Scientific Utopia: An agenda for improving scientific communication (Invited)

    NASA Astrophysics Data System (ADS)

    Nosek, B.

    2013-12-01

    The scientist's primary incentive is publication. In the present culture, open practices do not increase chances of publication, and they often require additional work. Practicing the abstract scientific values of openness and reproducibility thus requires behaviors in addition to those relevant for the primary, concrete rewards. When in conflict, concrete rewards are likely to dominate over abstract ones. As a consequence, the reward structure for scientists does not encourage openness and reproducibility. This can be changed by nudging incentives to align scientific practices with scientific values. Science will benefit by creating and connecting technologies that nudge incentives while supporting and improving the scientific workflow. For example, it should be as easy to search the research literature for my topic as it is to search the Internet to find hilarious videos of cats falling off of furniture. I will introduce the Center for Open Science (http://centerforopenscience.org/) and efforts to improve openness and reproducibility such as http://openscienceframework.org/. There will be no cats.

  14. An automated proteomic data analysis workflow for mass spectrometry

    PubMed Central

    2009-01-01

    Background Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS2) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics. Results The input for our workflow is Bioworks™ 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure ΣXcorr. Alternatively Prot

  15. SU-E-J-78: Adaptive Planning Workflow in a Pencil Beam Scanning Proton Therapy Center

    SciTech Connect

    Blakey, M; Price, S; Robison, B; Niek, S; Moe, S; Renegar, J; Mark, A; Spenser, W

    2015-06-15

    Purpose: The susceptibility of proton therapy to changes in patient setup and anatomy necessitates an adaptive planning process. With the right planning tools and clinical workflow, an adaptive plan can be created in a timely manner without adding significant workload to the treatment planning staff. Methods: In our center, a weekly QA CT is performed on most patients to assess setup, anatomy change, and tumor response. The QA CT is fused to the treatment planning CT, the contours are transferred via deformable registration, and the plan dose is recalculated on the QA CT. A physicist assesses the dose distribution, and an adaptive plan is requested based on tumor coverage or OAR dose changes. After the physician confirms or alters the deformed contours, a dosimetrist develops an adaptive plan using our TPS adaptation module. The plan is assessed for robustness and is then reviewed by the physician. Patient QA is performed within three days following the first adapted treatment. Results: Of the patients who received QA CTs, 19% required at least one adaptive plan (18.5% H&N, 18.5% brain, 11.1% breast, 14.8% chestwall, 14.8% lung, 18.5% pelvis and 3.8% abdomen). Of these patients, 14% went on a break, while the remainder was treated with the previous plan during the re-planning process. Adaptive plans were performed based on tumor shrinkage, anatomy change or positioning uncertainties for 37.9%, 44.8%, and 17.3% of the patients, respectively. On average, 3 full days are required between the QA CT and the first adapted plan treatment. Conclusion: Adaptive planning is a crucial component of proton therapy and should be applied to any site when the QA CT shows significant deviation from the plan. With an efficient workflow, an adaptive plan can be applied without delaying patient treatment or burdening the dosimetry and medical physics team.

  16. Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach

    PubMed Central

    Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J

    2012-01-01

    Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881

  17. When Workflow Management Systems and Logging Systems Meet: Analyzing Large-Scale Execution Traces

    SciTech Connect

    Gunter, Daniel

    2008-07-31

    This poster shows the benefits of integrating a workflow management system with logging and log mining capabilities. By combing two existing, mature technologies: Pegasus-WMS and Netlogger, we are able to efficiently process execution logs of earthquake science workflows consisting of hundreds of thousands to one million tasks. In particular we show results of processing logs of CyberShake, a workflow application running on the TeraGrid. Client-side tools allow scientists to quickly gather statistics about a workflow run and find out which tasks executed, where they were executed, what was their runtime, etc. These statistics can be used to understand the performance characteristics of a workflow and help tune the execution parameters of the workflow management system. This poster shows the scalability of the system presenting results of uploading task execution records into the system and by showing results of querying the system for overall workflow performance information.

  18. Workflow Modelling and Analysis Based on the Construction of Task Models

    PubMed Central

    Cravo, Glória

    2015-01-01

    We describe the structure of a workflow as a graph whose vertices represent tasks and the arcs are associated to workflow transitions in this paper. To each task an input/output logic operator is associated. Furthermore, we associate a Boolean term to each transition present in the workflow. We still identify the structure of workflows and describe their dynamism through the construction of new task models. This construction is very simple and intuitive since it is based on the analysis of all tasks present on the workflow that allows us to describe the dynamism of the workflow very easily. So, our approach has the advantage of being very intuitive, which is an important highlight of our work. We also introduce the concept of logical termination of workflows and provide conditions under which this property is valid. Finally, we provide a counter-example which shows that a conjecture presented in a previous article is false. PMID:25705713

  19. Workflow modelling and analysis based on the construction of task models.

    PubMed

    Cravo, Glória

    2015-01-01

    We describe the structure of a workflow as a graph whose vertices represent tasks and the arcs are associated to workflow transitions in this paper. To each task an input/output logic operator is associated. Furthermore, we associate a Boolean term to each transition present in the workflow. We still identify the structure of workflows and describe their dynamism through the construction of new task models. This construction is very simple and intuitive since it is based on the analysis of all tasks present on the workflow that allows us to describe the dynamism of the workflow very easily. So, our approach has the advantage of being very intuitive, which is an important highlight of our work. We also introduce the concept of logical termination of workflows and provide conditions under which this property is valid. Finally, we provide a counter-example which shows that a conjecture presented in a previous article is false. PMID:25705713

  20. Computer imaging and workflow systems in the business office.

    PubMed

    Adams, W T; Veale, F H; Helmick, P M

    1999-05-01

    Computer imaging and workflow technology automates many business processes that currently are performed using paper processes. Documents are scanned into the imaging system and placed in electronic patient account folders. Authorized users throughout the organization, including preadmission, verification, admission, billing, cash posting, customer service, and financial counseling staff, have online access to the information they need when they need it. Such streamlining of business functions can increase collections and customer satisfaction while reducing labor, supply, and storage costs. Because the costs of a comprehensive computer imaging and workflow system can be considerable, healthcare organizations should consider implementing parts of such systems that can be cost-justified or include implementation as part of a larger strategic technology initiative.

  1. CONNJUR Workflow Builder: a software integration environment for spectral reconstruction.

    PubMed

    Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O; Ellis, Heidi J C; Gryk, Michael R

    2015-07-01

    CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses.

  2. CONNJUR Workflow Builder: A software integration environment for spectral reconstruction

    PubMed Central

    Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O.; Ellis, Heidi J.C.; Gryk, Michael R.

    2015-01-01

    CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses. PMID:26066803

  3. A computational workflow for designing silicon donor qubits

    NASA Astrophysics Data System (ADS)

    Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; Huang, Jingsong; Britton, Charles; Curtis, Franklin G.; Dumitrescu, Eugene F.; Mohiyaddin, Fahd A.; Sumpter, Bobby G.

    2016-10-01

    Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to perform detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. The resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.

  4. Enabling Smart Workflows over Heterogeneous ID-Sensing Technologies

    PubMed Central

    Giner, Pau; Cetina, Carlos; Lacuesta, Raquel; Palacios, Guillermo

    2012-01-01

    Sensing technologies in mobile devices play a key role in reducing the gap between the physical and the digital world. The use of automatic identification capabilities can improve user participation in business processes where physical elements are involved (Smart Workflows). However, identifying all objects in the user surroundings does not automatically translate into meaningful services to the user. This work introduces Parkour, an architecture that allows the development of services that match the goals of each of the participants in a smart workflow. Parkour is based on a pluggable architecture that can be extended to provide support for new tasks and technologies. In order to facilitate the development of these plug-ins, tools that automate the development process are also provided. Several Parkour-based systems have been developed in order to validate the applicability of the proposal. PMID:23202193

  5. ESO Reflex: a graphical workflow engine for data reduction

    NASA Astrophysics Data System (ADS)

    Hook, Richard; Ullgrén, Marko; Romaniello, Martino; Maisala, Sami; Oittinen, Tero; Solin, Otto; Savolainen, Ville; Järveläinen, Pekka; Tyynelä, Jani; Péron, Michèle; Ballester, Pascal; Gabasch, Armin; Izzo, Carlo

    ESO Reflex is a prototype software tool that provides a novel approach to astronomical data reduction by integrating a modern graphical workflow system (Taverna) with existing legacy data reduction algorithms. Most of the raw data produced by instruments at the ESO Very Large Telescope (VLT) in Chile are reduced using recipes. These are compiled C applications following an ESO standard and utilising routines provided by the Common Pipeline Library (CPL). Currently these are run in batch mode as part of the data flow system to generate the input to the ESO/VLT quality control process and are also exported for use offline. ESO Reflex can invoke CPL-based recipes in a flexible way through a general purpose graphical interface. ESO Reflex is based on the Taverna system that was originally developed within the UK life-sciences community. Workflows have been created so far for three VLT/VLTI instruments, and the GUI allows the user to make changes to these or create workflows of their own. Python scripts or IDL procedures can be easily brought into workflows and a variety of visualisation and display options, including custom product inspection and validation steps, are available. Taverna is intended for use with web services and experiments using ESO Reflex to access Virtual Observatory web services have been successfully performed. ESO Reflex is the main product developed by Sampo, a project led by ESO and conducted by a software development team from Finland as an in-kind contribution to joining ESO. The goal was to look into the needs of the ESO community in the area of data reduction environments and to create pilot software products that illustrate critical steps along the road to a new system. Sampo concluded early in 2008. This contribution will describe ESO Reflex and show several examples of its use both locally and using Virtual Observatory remote web services. ESO Reflex is expected to be released to the community in early 2009.

  6. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

    PubMed Central

    Yang, In Seok

    2015-01-01

    RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification. PMID:26865842

  7. A framework for workflow-based clinical research billing disambiguation.

    PubMed

    Payne, Philip R O; Borlawsky, Tara; Kamal, Jyoti; Saltz, Joel H

    2007-10-11

    Medicare received authorization in 2000 to reimburse for routine costs incurred in association with patients participating in clinical research. However, we hypothesize that the inability to accurately differentiate standard from investigational care has resulted in under-coding of potentially reimbursable clinical events. To address this problem, we have initiated the development of a methodology for constructing computational clinical workflow models that can be employed to aid in the disambiguation of routine versus research costs.

  8. Evidence-Based Workflows for Thyroid and Parathyroid Surgery

    PubMed Central

    Meltzer, Charles; Budayr, Amer; Chavez, Annette; Dlott, Richard; Greif, William; Gurushanthaiah, Deepak; Klonecke, Andrew; Lando, Matthew; Leary, Joyce; Nayak, Sundeep; Niederkohr, Ryan; Park, Judith; Savitz, Alison; Schwartz, Henry

    2016-01-01

    A need exists to reduce care variations by standardizing the practice of thyroid and parathyroid surgery. During the course of a year, a task force developed algorithms representing decision points and workflows based on American Thyroid Association guidelines and three internal studies of surgical practices in the Northern and Southern California Regions of Kaiser Permanente conducted in collaboration with Health Information Technology Transformation & Analytics (HITTA). PMID:27479948

  9. Evidence-Based Workflows for Thyroid and Parathyroid Surgery.

    PubMed

    Meltzer, Charles; Budayr, Amer; Chavez, Annette; Dlott, Richard; Greif, William; Gurushanthaiah, Deepak; Klonecke, Andrew; Lando, Matthew; Leary, Joyce; Nayak, Sundeep; Niederkohr, Ryan; Park, Judith; Savitz, Alison; Schwartz, Henry

    2016-01-01

    A need exists to reduce care variations by standardizing the practice of thyroid and parathyroid surgery. During the course of a year, a task force developed algorithms representing decision points and workflows based on American Thyroid Association guidelines and three internal studies of surgical practices in the Northern and Southern California Regions of Kaiser Permanente conducted in collaboration with Health Information Technology Transformation & Analytics (HITTA). PMID:27479948

  10. AnalyzeThis: An Analysis Workflow-Aware Storage System

    SciTech Connect

    Sim, Hyogi; Kim, Youngjae; Vazhkudai, Sudharshan S; Tiwari, Devesh; Anwar, Ali; Butt, Ali R; Ramakrishnan, Lavanya

    2015-01-01

    The need for novel data analysis is urgent in the face of a data deluge from modern applications. Traditional approaches to data analysis incur significant data movement costs, moving data back and forth between the storage system and the processor. Emerging Active Flash devices enable processing on the flash, where the data already resides. An array of such Active Flash devices allows us to revisit how analysis workflows interact with storage systems. By seamlessly blending together the flash storage and data analysis, we create an analysis workflow-aware storage system, AnalyzeThis. Our guiding principle is that analysis-awareness be deeply ingrained in each and every layer of the storage, elevating data analyses as first-class citizens, and transforming AnalyzeThis into a potent analytics-aware appliance. We implement the AnalyzeThis storage system atop an emulation platform of the Active Flash array. Our results indicate that AnalyzeThis is viable, expediting workflow execution and minimizing data movement.

  11. IT-benchmarking of clinical workflows: concept, implementation, and evaluation.

    PubMed

    Thye, Johannes; Straede, Matthias-Christopher; Liebe, Jan-David; Hübner, Ursula

    2014-01-01

    Due to the emerging evidence of health IT as opportunity and risk for clinical workflows, health IT must undergo a continuous measurement of its efficacy and efficiency. IT-benchmarks are a proven means for providing this information. The aim of this study was to enhance the methodology of an existing benchmarking procedure by including, in particular, new indicators of clinical workflows and by proposing new types of visualisation. Drawing on the concept of information logistics, we propose four workflow descriptors that were applied to four clinical processes. General and specific indicators were derived from these descriptors and processes. 199 chief information officers (CIOs) took part in the benchmarking. These hospitals were assigned to reference groups of a similar size and ownership from a total of 259 hospitals. Stepwise and comprehensive feedback was given to the CIOs. Most participants who evaluated the benchmark rated the procedure as very good, good, or rather good (98.4%). Benchmark information was used by CIOs for getting a general overview, advancing IT, preparing negotiations with board members, and arguing for a new IT project.

  12. a Workflow for UAV's Integration Into a Geodesign Platform

    NASA Astrophysics Data System (ADS)

    Anca, P.; Calugaru, A.; Alixandroae, I.; Nazarie, R.

    2016-06-01

    This paper presents a workflow for the development of various Geodesign scenarios. The subject is important in the context of identifying patterns and designing solutions for a Smart City with optimized public transportation, efficient buildings, efficient utilities, recreational facilities a.s.o.. The workflow describes the procedures starting with acquiring data in the field, data processing, orthophoto generation, DTM generation, integration into a GIS platform and analyzing for a better support for Geodesign. Esri's City Engine is used mostly for 3D modeling capabilities that enable the user to obtain 3D realistic models. The workflow uses as inputs information extracted from images acquired using UAVs technologies, namely eBee, existing 2D GIS geodatabases, and a set of CGA rules. The method that we used further, is called procedural modeling, and uses rules in order to extrude buildings, the street network, parcel zoning and side details, based on the initial attributes from the geodatabase. The resulted products are various scenarios for redesigning, for analyzing new exploitation sites. Finally, these scenarios can be published as interactive web scenes for internal, groups or pubic consultation. In this way, problems like the impact of new constructions being build, re-arranging green spaces or changing routes for public transportation, etc. are revealed through impact and visibility analysis or shadowing analysis and are brought to the citizen's attention. This leads to better decisions.

  13. Research on a dynamic workflow access control model

    NASA Astrophysics Data System (ADS)

    Liu, Yiliang; Deng, Jinxia

    2007-12-01

    In recent years, the access control technology has been researched widely in workflow system, two typical technologies of that are RBAC (Role-Based Access Control) and TBAC (Task-Based Access Control) model, which has been successfully used in the role authorizing and assigning in a certain extent. However, during the process of complicating a system's structure, these two types of technology can not be used in minimizing privileges and separating duties, and they are inapplicable when users have a request of frequently changing on the workflow's process. In order to avoid having these weakness during the applying, a variable flow dynamic role_task_view (briefly as DRTVBAC) of fine-grained access control model is constructed on the basis existed model. During the process of this model applying, an algorithm is constructed to solve users' requirements of application and security needs on fine-grained principle of privileges minimum and principle of dynamic separation of duties. The DRTVBAC model is implemented in the actual system, the figure shows that the task associated with the dynamic management of role and the role assignment is more flexible on authority and recovery, it can be met the principle of least privilege on the role implement of a specific task permission activated; separated the authority from the process of the duties completing in the workflow; prevented sensitive information discovering from concise and dynamic view interface; satisfied with the requirement of the variable task-flow frequently.

  14. Workflow in interventional radiology: nerve blocks and facet blocks

    NASA Astrophysics Data System (ADS)

    Siddoway, Donald; Ingeholm, Mary Lou; Burgert, Oliver; Neumuth, Thomas; Watson, Vance; Cleary, Kevin

    2006-03-01

    Workflow analysis has the potential to dramatically improve the efficiency and clinical outcomes of medical procedures. In this study, we recorded the workflow for nerve block and facet block procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in an internet browser, and allows the user to record the actions taken by the physician during a procedure. The data recorded during the procedure is stored as an XML document, which can then be further processed. We have successfully gathered data on a number if cases using a tablet PC, and these preliminary results show the feasibility of using this software in an interventional radiology setting. We are currently accruing additional cases and when more data has been collected we will analyze the workflow of these procedures to look for inefficiencies and potential improvements.

  15. Improved compliance by BPM-driven workflow automation.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Fleischer, Heidi; Thurow, Kerstin

    2014-12-01

    Using methods and technologies of business process management (BPM) for the laboratory automation has important benefits (i.e., the agility of high-level automation processes, rapid interdisciplinary prototyping and implementation of laboratory tasks and procedures, and efficient real-time process documentation). A principal goal of the model-driven development is the improved transparency of processes and the alignment of process diagrams and technical code. First experiences of using the business process model and notation (BPMN) show that easy-to-read graphical process models can achieve and provide standardization of laboratory workflows. The model-based development allows one to change processes quickly and an easy adaption to changing requirements. The process models are able to host work procedures and their scheduling in compliance with predefined guidelines and policies. Finally, the process-controlled documentation of complex workflow results addresses modern laboratory needs of quality assurance. BPMN 2.0 as an automation language to control every kind of activity or subprocess is directed to complete workflows in end-to-end relationships. BPMN is applicable as a system-independent and cross-disciplinary graphical language to document all methods in laboratories (i.e., screening procedures or analytical processes). That means, with the BPM standard, a communication method of sharing process knowledge of laboratories is also available.

  16. Automated quality control in a file-based broadcasting workflow

    NASA Astrophysics Data System (ADS)

    Zhang, Lina

    2014-04-01

    Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.

  17. Mechanisms for Robust Cognition

    ERIC Educational Resources Information Center

    Walsh, Matthew M.; Gluck, Kevin A.

    2015-01-01

    To function well in an unpredictable environment using unreliable components, a system must have a high degree of robustness. Robustness is fundamental to biological systems and is an objective in the design of engineered systems such as airplane engines and buildings. Cognitive systems, like biological and engineered systems, exist within…

  18. PREDON Scientific Data Preservation 2014

    NASA Astrophysics Data System (ADS)

    Diaconu, C.; Kraml, S.; Surace, C.; Chateigner, D.; Libourel, T.; Laurent, A.; Lin, Y.; Schaming, M.; Benbernou, S.; Lebbah, M.; Boucon, D.; Cérin, C.; Azzag, H.; Mouron, P.; Nief, J.-Y.; Coutin, S.; Beckmann, V.

    Scientific data collected with modern sensors or dedicated detectors exceed very often the perimeter of the initial scientific design. These data are obtained more and more frequently with large material and human efforts. A large class of scientific experiments are in fact unique because of their large scale, with very small chances to be repeated and to superseded by new experiments in the same domain: for instance high energy physics and astrophysics experiments involve multi-annual developments and a simple duplication of efforts in order to reproduce old data is simply not affordable. Other scientific experiments are in fact unique by nature: earth science, medical sciences etc. since the collected data is "time-stamped" and thereby non-reproducible by new experiments or observations. In addition, scientific data collection increased dramatically in the recent years, participating to the so-called "data deluge" and inviting for common reflection in the context of "big data" investigations. The new knowledge obtained using these data should be preserved long term such that the access and the re-use are made possible and lead to an enhancement of the initial investment. Data observatories, based on open access policies and coupled with multi-disciplinary techniques for indexing and mining may lead to truly new paradigms in science. It is therefore of outmost importance to pursue a coherent and vigorous approach to preserve the scientific data at long term. The preservation remains nevertheless a challenge due to the complexity of the data structure, the fragility of the custom-made software environments as well as the lack of rigorous approaches in workflows and algorithms. To address this challenge, the PREDON project has been initiated in France in 2012 within the MASTODONS program: a Big Data scientific challenge, initiated and supported by the Interdisciplinary Mission of the National Centre for Scientific Research (CNRS). PREDON is a study group formed by

  19. Argo: enabling the development of bespoke workflows and services for disease annotation

    PubMed Central

    Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

    2016-01-01

    Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest. With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources. This article presents the application of Argo’s capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V’s User Interactive Track (IAT), we demonstrated and evaluated Argo’s suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track’s top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo’s support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition

  20. Argo: enabling the development of bespoke workflows and services for disease annotation.

    PubMed

    Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

    2016-01-01

    Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models

  1. Concept and application of a computational vaccinology workflow

    PubMed Central

    2010-01-01

    Background The last years have seen a renaissance of the vaccine area, driven by clinical needs in infectious diseases but also chronic diseases such as cancer and autoimmune disorders. Equally important are technological improvements involving nano-scale delivery platforms as well as third generation adjuvants. In parallel immunoinformatics routines have reached essential maturity for supporting central aspects in vaccinology going beyond prediction of antigenic determinants. On this basis computational vaccinology has emerged as a discipline aimed at ab-initio rational vaccine design. Here we present a computational workflow for implementing computational vaccinology covering aspects from vaccine target identification to functional characterization and epitope selection supported by a Systems Biology assessment of central aspects in host-pathogen interaction. We exemplify the procedures for Epstein Barr Virus (EBV), a clinically relevant pathogen causing chronic infection and suspected of triggering malignancies and autoimmune disorders. Results We introduce pBone/pView as a computational workflow supporting design and execution of immunoinformatics workflow modules, additionally involving aspects of results visualization, knowledge sharing and re-use. Specific elements of the workflow involve identification of vaccine targets in the realm of a Systems Biology assessment of host-pathogen interaction for identifying functionally relevant targets, as well as various methodologies for delineating B- and T-cell epitopes with particular emphasis on broad coverage of viral isolates as well as MHC alleles. Applying the workflow on EBV specifically proposes sequences from the viral proteins LMP2, EBNA2 and BALF4 as vaccine targets holding specific B- and T-cell epitopes promising broad strain and allele coverage. Conclusion Based on advancements in the experimental assessment of genomes, transcriptomes and proteomes for both, pathogen and (human) host, the fundaments for

  2. Replication and robustness in developmental research.

    PubMed

    Duncan, Greg J; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J

    2014-11-01

    Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key results are robust across estimation methods, data sets, and demographic subgroups. This article makes the case for prioritizing both explicit replications and, especially, within-study robustness checks in developmental psychology. It provides evidence on variation in effect sizes in developmental studies and documents strikingly different replication and robustness-checking practices in a sample of journals in developmental psychology and a sister behavioral science-applied economics. Our goal is not to show that any one behavioral science has a monopoly on best practices, but rather to show how journals from a related discipline address vital concerns of replication and generalizability shared by all social and behavioral sciences. We provide recommendations for promoting graduate training in replication and robustness-checking methods and for editorial policies that encourage these practices. Although some of our recommendations may shift the form and substance of developmental research articles, we argue that they would generate considerable scientific benefits for the field. (PsycINFO Database Record (c) 2014 APA, all rights reserved).

  3. Workflow in Clinical Trial Sites & Its Association with Near Miss Events for Data Quality: Ethnographic, Workflow & Systems Simulation

    PubMed Central

    Araujo de Carvalho, Elias Cesar; Batilana, Adelia Portero; Claudino, Wederson; Lima Reis, Luiz Fernando; Schmerling, Rafael A.; Shah, Jatin; Pietrobon, Ricardo

    2012-01-01

    Background With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Methodology/Principal Findings Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Conclusions/Significance Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage

  4. A data management and publication workflow for a large-scale, heterogeneous sensor network.

    PubMed

    Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan

    2015-06-01

    It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use. PMID:25968554

  5. Scientific Misconduct.

    ERIC Educational Resources Information Center

    Goodstein, David

    2002-01-01

    Explores scientific fraud, asserting that while few scientists actually falsify results, the field has become so competitive that many are misbehaving in other ways; an example would be unreasonable criticism by anonymous peer reviewers. (EV)

  6. Modeling Workflow for the DOE Atmospheric Radiation Measurement Program's LES ARM Symbiotic Simulation and Observation (LASSO) Workflow

    NASA Astrophysics Data System (ADS)

    Gustafson, W. I., Jr.; Vogelmann, A. M.; Xiao, H.; Cheng, X.; Endo, S.; Li, Z.; Toto, T.

    2015-12-01

    The Department of Energy Atmospheric Radiation Measurement (ARM) Program is expanding its products to include routine large-eddy simulation (LES) modeling to compliment its extensive suite of climate-relevant observations, with the name of the new venture being the "LES ARM Symbiotic Simulation and Observation (LASSO) Workflow". Decisions are currently being made regarding how to best configure both the specific model to be used, as well as the overall workflow that will be established. The initial focus of the routine modeling will be shallow convection at the ARM megasite in Oklahoma with a vision toward expanding the modeling to include other meteorological conditions once the routine modeling has been established. This presentation outlines the modeling portion of the workflow that includes generation of multiple forcing datasets and ensemble LES runs. The goal of the ensembles is to gauge the uncertainty of the forcings from event-to-event and to help derive a best estimate representation of the atmosphere over the megasite. This will then be used to construct "data cubes" that combine observations with the model output. A companion presentation by Vogelmann et al. presents the data cube concept that optimizes usage of observations with the LES.

  7. BEAM: A computational workflow system for managing and modeling material characterization data in HPC environments

    SciTech Connect

    Lingerfelt, Eric J; Endeve, Eirik; Ovchinnikov, Oleg S; Borreguero Calvo, Jose M; Park, Byung H; Archibald, Richard K; Symons, Christopher T; Kalinin, Sergei V; Messer, Bronson; Shankar, Mallikarjun; Jesse, Stephen

    2016-01-01

    Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now with the rise of multimodal acquisition systems and the associated processing capability the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalable data analysis and simulation via an intuitive, cross-platform client user interface. This framework delivers authenticated, push-button execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing the converged compute-and-data infrastructure at Oak Ridge National Laboratory s (ORNL) Compute and Data Environment for Science (CADES) and HPC environments like Titan at the Oak Ridge Leadership Computing Facility (OLCF). In this work we address the underlying HPC needs for characterization in the material science community, elaborate how BEAM s design and infrastructure tackle those needs, and present a small sub-set of user cases where scientists utilized BEAM across a broad range of analytical techniques and analysis modes.

  8. Considering Time in Orthophotography Production: from a General Workflow to a Shortened Workflow for a Faster Disaster Response

    NASA Astrophysics Data System (ADS)

    Lucas, G.

    2015-08-01

    This article overall deals with production time with orthophoto imagery with medium size digital frame camera. The workflow examination follows two main parts: data acquisition and post-processing. The objectives of the research are fourfold: 1/ gathering time references for the most important steps of orthophoto production (it turned out that literature is missing on this topic); these figures are used later for total production time estimation; 2/ identifying levers for reducing orthophoto production time; 3/ building a simplified production workflow for emergency response: less exigent with accuracy and faster; and compare it to a classical workflow; 4/ providing methodical elements for the estimation of production time with a custom project. In the data acquisition part a comprehensive review lists and describes all the factors that may affect the acquisition efficiency. Using a simulation with different variables (average line length, time of the turns, flight speed) their effect on acquisition efficiency is quantitatively examined. Regarding post-processing, the time references figures were collected from the processing of a 1000 frames case study with 15 cm GSD covering a rectangular area of 447 km2; the time required to achieve each step during the production is written down. When several technical options are possible, each one is tested and time documented so as all alternatives are available. Based on a technical choice with the workflow and using the compiled time reference of the elementary steps, a total time is calculated for the post-processing of the 1000 frames. Two scenarios are compared as regards to time and accuracy. The first one follows the "normal" practices, comprising triangulation, orthorectification and advanced mosaicking methods (feature detection, seam line editing and seam applicator); the second is simplified and make compromise over positional accuracy (using direct geo-referencing) and seamlines preparation in order to achieve

  9. Evaluation of User Interface and Workflow Design of a Bedside Nursing Clinical Decision Support System

    PubMed Central

    Yuan, Michael Juntao; Finley, George Mike; Mills, Christy; Johnson, Ron Kim

    2013-01-01

    Background Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Objective Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Methods Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration

  10. Robust Methods in Qsar

    NASA Astrophysics Data System (ADS)

    Walczak, Beata; Daszykowski, Michał; Stanimirova, Ivana

    A large progress in the development of robust methods as an efficient tool for processing of data contaminated with outlying objects has been made over the last years. Outliers in the QSAR studies are usually the result of an improper calculation of some molecular descriptors and/or experimental error in determining the property to be modelled. They influence greatly any least square model, and therefore the conclusions about the biological activity of a potential component based on such a model are misleading. With the use of robust approaches, one can solve this problem building a robust model describing the data majority well. On the other hand, the proper identification of outliers may pinpoint a new direction of a drug development. The outliers' assessment can exclusively be done with robust methods and these methods are to be described in this chapter

  11. Inter-Observer Reliability Assessments in Time Motion Studies: The Foundation for Meaningful Clinical Workflow Analysis

    PubMed Central

    Lopetegui, Marcelo A.; Bai, Shasha; Yen, Po-Yin; Lai, Albert; Embi, Peter; Payne, Philip R.O.

    2013-01-01

    Understanding clinical workflow is critical for researchers and healthcare decision makers. Current workflow studies tend to oversimplify and underrepresent the complexity of clinical workflow. Continuous observation time motion studies (TMS) could enhance clinical workflow studies by providing rich quantitative data required for in-depth workflow analyses. However, methodological inconsistencies have been reported in continuous observation TMS, potentially reducing the validity of TMS’ data and limiting their contribution to the general state of knowledge. We believe that a cornerstone in standardizing TMS is to ensure the reliability of the human observers. In this manuscript we review the approaches for inter-observer reliability assessment (IORA) in a representative sample of TMS focusing on clinical workflow. We found that IORA is an uncommon practice, inconsistently reported, and often uses methods that provide partial and overestimated measures of agreement. Since a comprehensive approach to IORA is yet to be proposed and validated, we provide initial recommendations for IORA reporting in continuous observation TMS. PMID:24551381

  12. Observing health professionals' workflow patterns for diabetes care - First steps towards an ontology for EHR services.

    PubMed

    Schweitzer, M; Lasierra, N; Hoerbst, A

    2015-01-01

    Increasing the flexibility from a user-perspective and enabling a workflow based interaction, facilitates an easy user-friendly utilization of EHRs for healthcare professionals' daily work. To offer such versatile EHR-functionality, our approach is based on the execution of clinical workflows by means of a composition of semantic web-services. The backbone of such architecture is an ontology which enables to represent clinical workflows and facilitates the selection of suitable services. In this paper we present the methods and results after running observations of diabetes routine consultations which were conducted in order to identify those workflows and the relation among the included tasks. Mentioned workflows were first modeled by BPMN and then generalized. As a following step in our study, interviews will be conducted with clinical personnel to validate modeled workflows.

  13. Definition of astronomical data analysis workflows on a service-oriented Grid architecture using Business Process Execution Language

    NASA Astrophysics Data System (ADS)

    Manna, V.; Cascone, E.; Capasso, G.; Tortone, G.

    2006-07-01

    Many international Grid projects have been launched recently in order to face the enormous amount of astronomical data which is produced and/or will be produced in the next years by the new generation of large telescopes and CCD mosaic detectors. Grid is the effective and natural solution for the CCD mosaic data processing but due to complex composition of astronomical applications a Grid workflow management system is required in order to simplify the description of nested tasks execution, monitoring and data handling. The recently released Business Process Execution Language for Web Services (BPEL4WS) specification is positioned to become the standard for Web services composition. It allows you to create complex processes by creating and wiring together different activities that can, for example, perform Web services invocations, manipulate data, throw faults, or terminate a process. The evolution towards a service-oriented architecture, supported by emerging standard, is an activity that has many attention. This issue is being tackled within the EU-funded EGEE project (Enabling Grids for E-science in Europe) whose primary goals are the provision of robust middleware components and the creation of reliable and dependable Grid infrastructure to support e-Science applications, using an architecture that offers a Web Service interface. In this paper we present a proposal for the execution of astronomical data analysis workflow on a service-oriented Grid architecture using the new standard language BPEL4WS.

  14. Accelerating Medical Research using the Swift Workflow System

    PubMed Central

    STEF-PRAUN, Tiberiu; CLIFFORD, Benjamin; FOSTER, Ian; HASSON, Uri; HATEGAN, Mihael; SMALL, Steven L.; WILDE, Michael; ZHAO, Yong

    2009-01-01

    Both medical research and clinical practice are starting to involve large quantities of data and to require large-scale computation, as a result of the digitization of many areas of medicine. For example, in brain research – the domain that we consider here – a single research study may require the repeated processing, using computationally demanding and complex applications, of thousands of files corresponding to hundreds of functional MRI studies. Execution efficiency demands the use of parallel or distributed computing, but few medical researchers have the time or expertise to write the necessary parallel programs. The Swift system addresses these concerns. A simple scripting language, SwiftScript, provides for the concise high-level specification of workflows that invoke various application programs on potentially large quantities of data. The Swift engine provides for the efficient execution of these workflows on sequential computers, parallel computers, and/or distributed grids that federate the computing resources of many sites. Last but not least, the Swift provenance catalog keeps track of all actions performed, addressing vital bookkeeping functions that so often cause difficulties in large computations. To illustrate the use of Swift for medical research, we describe its use for the analysis of functional MRI data as part of a research project examining the neurological mechanisms of recovery from aphasia after stroke. We show how SwiftScript is used to encode an application workflow, and present performance results that demonstrate our ability to achieve significant speedups on both a local parallel computing cluster and multiple parallel clusters at distributed sites. PMID:17476063

  15. MBAT: A scalable informatics system for unifying digital atlasing workflows

    PubMed Central

    2010-01-01

    Background Digital atlases provide a common semantic and spatial coordinate system that can be leveraged to compare, contrast, and correlate data from disparate sources. As the quality and amount of biological data continues to advance and grow, searching, referencing, and comparing this data with a researcher's own data is essential. However, the integration process is cumbersome and time-consuming due to misaligned data, implicitly defined associations, and incompatible data sources. This work addressing these challenges by providing a unified and adaptable environment to accelerate the workflow to gather, align, and analyze the data. Results The MouseBIRN Atlasing Toolkit (MBAT) project was developed as a cross-platform, free open-source application that unifies and accelerates the digital atlas workflow. A tiered, plug-in architecture was designed for the neuroinformatics and genomics goals of the project to provide a modular and extensible design. MBAT provides the ability to use a single query to search and retrieve data from multiple data sources, align image data using the user's preferred registration method, composite data from multiple sources in a common space, and link relevant informatics information to the current view of the data or atlas. The workspaces leverage tool plug-ins to extend and allow future extensions of the basic workspace functionality. A wide variety of tool plug-ins were developed that integrate pre-existing as well as newly created technology into each workspace. Novel atlasing features were also developed, such as supporting multiple label sets, dynamic selection and grouping of labels, and synchronized, context-driven display of ontological data. Conclusions MBAT empowers researchers to discover correlations among disparate data by providing a unified environment for bringing together distributed reference resources, a user's image data, and biological atlases into the same spatial or semantic context. Through its extensible

  16. A practical data processing workflow for multi-OMICS projects.

    PubMed

    Kohl, Michael; Megger, Dominik A; Trippler, Martin; Meckel, Hagen; Ahrens, Maike; Bracht, Thilo; Weber, Frank; Hoffmann, Andreas-Claudius; Baba, Hideo A; Sitek, Barbara; Schlaak, Jörg F; Meyer, Helmut E; Stephan, Christian; Eisenacher, Martin

    2014-01-01

    Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post

  17. Leveraging an existing data warehouse to annotate workflow models for operations research and optimization.

    PubMed

    Borlawsky, Tara; LaFountain, Jeanne; Petty, Lynda; Saltz, Joel H; Payne, Philip R O

    2008-11-06

    Workflow analysis is frequently performed in the context of operations research and process optimization. In order to develop a data-driven workflow model that can be employed to assess opportunities to improve the efficiency of perioperative care teams at The Ohio State University Medical Center (OSUMC), we have developed a method for integrating standard workflow modeling formalisms, such as UML activity diagrams with data-centric annotations derived from our existing data warehouse.

  18. Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison

    PubMed Central

    de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

    2010-01-01

    Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative

  19. Scientific millenarianism

    SciTech Connect

    Weinberg, A.M.

    1997-12-01

    Today, for the first time, scientific concerns are seriously being addressed that span future times--hundreds, even thousands, or more years in the future. One is witnessing what the author calls scientific millenarianism. Are such concerns for the distant future exercises in futility, or are they real issues that, to the everlasting gratitude of future generations, this generation has identified, warned about and even suggested how to cope with in the distant future? Can the four potential catastrophes--bolide impact, CO{sub 2} warming, radioactive wastes and thermonuclear war--be avoided by technical fixes, institutional responses, religion, or by doing nothing? These are the questions addressed in this paper.

  20. SMITH: a LIMS for handling next-generation sequencing workflows

    PubMed Central

    2014-01-01

    workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934

  1. Reference and PDF-manager software: complexities, support and workflow.

    PubMed

    Mead, Thomas L; Berryman, Donna R

    2010-10-01

    In the past, librarians taught reference management by training library users to use established software programs such as RefWorks or EndNote. In today's environment, there is a proliferation of Web-based programs that are being used by library clientele that offer a new twist on the well-known reference management programs. Basically, these new programs are PDF-manager software (e.g., Mendeley or Papers). Librarians are faced with new questions, issues, and concerns, given the new workflows and pathways that these PDF-manager programs present. This article takes a look at some of those.

  2. CMS Data Processing Workflows during an Extended Cosmic Ray Run

    SciTech Connect

    Not Available

    2009-11-01

    The CMS Collaboration conducted a month-long data taking exercise, the Cosmic Run At Four Tesla, during October-November 2008, with the goal of commissioning the experiment for extended operation. With all installed detector systems participating, CMS recorded 270 million cosmic ray events with the solenoid at a magnetic field strength of 3.8 T. This paper describes the data flow from the detector through the various online and offline computing systems, as well as the workflows used for recording the data, for aligning and calibrating the detector, and for analysis of the data.

  3. Robust scientific evidence demonstrates benefits of artificial sweeteners

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Artificial sweeteners (AS) have not been found to have a negative impact on health in humans. They have been recommended as a safe alternative for individuals who are seeking to lose or maintain weight. However, unnecessary alarm has been raised regarding the potential health risks of AS. This is of...

  4. Design and implementation of a secure workflow system based on PKI/PMI

    NASA Astrophysics Data System (ADS)

    Yan, Kai; Jiang, Chao-hui

    2013-03-01

    As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.

  5. An Auto-management Thesis Program WebMIS Based on Workflow

    NASA Astrophysics Data System (ADS)

    Chang, Li; Jie, Shi; Weibo, Zhong

    An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.

  6. Speaking Scientific

    ERIC Educational Resources Information Center

    Mason, Peter

    1971-01-01

    Suggests changes for science curricula which will improve the understanding...of the scientific language in which the ideas of science and technology are expressed," including increasing the students' facility with numbers, and in the future, an interdisciplinary course demonstrating the approach of physical, biological and behavioral scientists,…

  7. DNA barcode-based delineation of putative species: efficient start for taxonomic workflows

    PubMed Central

    Kekkonen, Mari; Hebert, Paul D N

    2014-01-01

    The analysis of DNA barcode sequences with varying techniques for cluster recognition provides an efficient approach for recognizing putative species (operational taxonomic units, OTUs). This approach accelerates and improves taxonomic workflows by exposing cryptic species and decreasing the risk of synonymy. This study tested the congruence of OTUs resulting from the application of three analytical methods (ABGD, BIN, GMYC) to sequence data for Australian hypertrophine moths. OTUs supported by all three approaches were viewed as robust, but 20% of the OTUs were only recognized by one or two of the methods. These OTUs were examined for three criteria to clarify their status. Monophyly and diagnostic nucleotides were both uninformative, but information on ranges was useful as sympatric sister OTUs were viewed as distinct, while allopatric OTUs were merged. This approach revealed 124 OTUs of Hypertrophinae, a more than twofold increase from the currently recognized 51 species. Because this analytical protocol is both fast and repeatable, it provides a valuable tool for establishing a basic understanding of species boundaries that can be validated with subsequent studies. PMID:24479435

  8. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  9. An MRM-based workflow for absolute quantitation of lysine-acetylated metabolic enzymes in mouse liver.

    PubMed

    Xu, Leilei; Wang, Fang; Xu, Ying; Wang, Yi; Zhang, Cuiping; Qin, Xue; Yu, Hongxiu; Yang, Pengyuan

    2015-12-01

    As a key post-translational modification mechanism, protein acetylation plays critical roles in regulating and/or coordinating cell metabolism. Acetylation is a prevalent modification process in enzymes. Protein acetylation modification occurs in sub-stoichiometric amounts; therefore extracting biologically meaningful information from these acetylation sites requires an adaptable, sensitive, specific, and robust method for their quantification. In this work, we combine immunoassays and multiple reaction monitoring-mass spectrometry (MRM-MS) technology to develop an absolute quantification for acetylation modification. With this hybrid method, we quantified the acetylation level of metabolic enzymes, which could demonstrate the regulatory mechanisms of the studied enzymes. The development of this quantitative workflow is a pivotal step for advancing our knowledge and understanding of the regulatory effects of protein acetylation in physiology and pathophysiology.

  10. Robust control of accelerators

    SciTech Connect

    Johnson, W.J.D. ); Abdallah, C.T. )

    1990-01-01

    The problem of controlling the variations in the rf power system can be effectively cast as an application of modern control theory. Two components of this theory are obtaining a model and a feedback structure. The model inaccuracies influence the choice of a particular controller structure. Because of the modeling uncertainty, one has to design either a variable, adaptive controller or a fixed, robust controller to achieve the desired objective. The adaptive control scheme usually results in very complex hardware; and, therefore, shall not be pursued in this research. In contrast, the robust control methods leads to simpler hardware. However, robust control requires a more accurate mathematical model of the physical process than is required by adaptive control. Our research at the Los Alamos National Laboratory (LANL) and the University of New Mexico (UNM) has led to the development and implementation of a new robust rf power feedback system. In this paper, we report on our research progress. In section one, the robust control problem for the rf power system and the philosophy adopted for the beginning phase of our research is presented. In section two, the results of our proof-of-principle experiments are presented. In section three, we describe the actual controller configuration that is used in LANL FEL physics experiments. The novelty of our approach is that the control hardware is implemented directly in rf without demodulating, compensating, and then remodulating.

  11. Automation and workflow considerations for embedding Digimarc Barcodes at scale

    NASA Astrophysics Data System (ADS)

    Rodriguez, Tony; Haaga, Don; Calhoon, Sean

    2015-03-01

    The Digimarc® Barcode is a digital watermark applied to packages and variable data labels that carries GS1 standard GTIN-14 data traditionally carried by a 1-D barcode. The Digimarc Barcode can be read with smartphones and imaging-based barcode readers commonly used in grocery and retail environments. Using smartphones, consumers can engage with products and retailers can materially increase the speed of check-out, increasing store margins and providing a better experience for shoppers. Internal testing has shown an average of 53% increase in scanning throughput, enabling 100's of millions of dollars in cost savings [1] for retailers when deployed at scale. To get to scale, the process of embedding a digital watermark must be automated and integrated within existing workflows. Creating the tools and processes to do so represents a new challenge for the watermarking community. This paper presents a description and an analysis of the workflow implemented by Digimarc to deploy the Digimarc Barcode at scale. An overview of the tools created and lessons learned during the introduction of technology to the market are provided.

  12. Magallanes: a web services discovery and automatic workflow composition tool

    PubMed Central

    Ríos, Javier; Karlsson, Johan; Trelles, Oswaldo

    2009-01-01

    Background To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools that can work together to solve real complex problems. Results Magallanes (Magellan) is a versatile, platform-independent Java library of algorithms aimed at discovering bioinformatics web services and associated data types. A second important feature of Magallanes is its ability to connect available and compatible web services into workflows that can process data sequentially to reach a desired output given a particular input. Magallanes' capabilities can be exploited both as an API or directly accessed through a graphic user interface. The Magallanes' API is freely available for academic use, and together with Magallanes application has been tested in MS-Windows™ XP and Unix-like operating systems. Detailed implementation information, including user manuals and tutorials, is available at . Conclusion Different implementations of the same client (web page, desktop applications, web services, etc.) have been deployed and are currently in use in real installations such as the National Institute of Bioinformatics (Spain) and the ACGT-EU project. This shows the potential utility and versatility of the software library, including the integration of novel tools in the domain and with strong evidences in the line of facilitate the automatic discovering and composition of workflows. PMID:19832968

  13. Light-Weight Parallel Python Tools for Climate Model Workflows

    NASA Astrophysics Data System (ADS)

    Mickelson, S. A.; Paul, K.; Dennis, J.; Strand, G.

    2014-12-01

    It is expected that the data required for the next Intergovernmental Panel on Climate Change (IPCC) Assessment Report (AR6) will increase by more than a factor of 10 to an expected 25 terabytes per model. Experiences from the last Coupled Model Intercomparison Project (CMIP5), which assembled the data used for the last IPCC Assessment Report (AR5), concluded that the processing, archiving, and post-run diagnostic operations required on such large model output took almost as long to complete as the model runs themselves! As a result, we have been investigating and developing light-weight Python-based tools to parallelize the time-intensive post-run steps in the climate model workflow. In particular, we have developed a parallel Python tool for converting time-slice model output to time-series format, and we have more recently developed a parallel Python tool to perform fast time-averaging of time-series data, an operation needed for many diagnostic computations. These tools are designed to be light-weight, easy to install, with very few dependencies, and that can be easily inserted into the climate model workflow with negligible disruption. In this work, we present the motivation, approach, and results of the two light-weight parallel Python tools that we have developed, as well as our plans for future research and development.

  14. Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

    DOE PAGESBeta

    Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; Jha, Shantenu; Parashar, Manish

    2011-01-01

    In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less

  15. Workflow in interventional radiology: uterine fibroid embolization (UFE)

    NASA Astrophysics Data System (ADS)

    Lindisch, David; Neumuth, Thomas; Burgert, Oliver; Spies, James; Cleary, Kevin

    2008-03-01

    Workflow analysis can be used to record the steps taken during clinical interventions with the goal of identifying bottlenecks and streamlining the procedure efficiency. In this study, we recorded the workflow for uterine fibroid embolization (UFE) procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in a JAVA environment and enables an observer to record the actions taken by the physician and surgical team during these interventions. The data recorded is stored as an XML document, which can then be further processed. We recorded data from 30 patients and found a mean intervention time of 01:49:46 (+/- 16:04) minutes. The critical intervention step, the embolization, had a mean time of 00:15:42 (+/- 05:49) minutes, which was only 15% of the total intervention time.

  16. Accelerated partial breast irradiation utilizing brachytherapy: patient selection and workflow.

    PubMed

    Shah, Chirag; Wobb, Jessica; Manyam, Bindu; Khan, Atif; Vicini, Frank

    2016-02-01

    Accelerated partial breast irradiation (APBI) represents an evolving technique that is a standard of care option in appropriately selected woman following breast conserving surgery. While multiple techniques now exist to deliver APBI, interstitial brachytherapy represents the technique used in several randomized trials (National Institute of Oncology, GEC-ESTRO). More recently, many centers have adopted applicator-based brachytherapy to deliver APBI due to the technical complexities of interstitial brachytherapy. The purpose of this article is to review methods to evaluate and select patients for APBI, as well as to define potential workflow mechanisms that allow for the safe and effective delivery of APBI. Multiple consensus statements have been developed to guide clinicians on determining appropriate candidates for APBI. However, recent studies have demonstrated that these guidelines fail to stratify patients according to the risk of local recurrence, and updated guidelines are expected in the years to come. Critical elements of workflow to ensure safe and effective delivery of APBI include a multidisciplinary approach and evaluation, optimization of target coverage and adherence to normal tissue guideline constraints, and proper quality assurance methods. PMID:26985202

  17. A computational workflow for designing silicon donor qubits

    DOE PAGESBeta

    Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; Huang, Jingsong; Britton, Charles; Curtis, Franklin G.; Dumitrescu, Eugene F.; Mohiyaddin, Fahd A.; Sumpter, Bobby G.

    2016-09-19

    Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to performmore » detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. In conclusion, the resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.« less

  18. An ontological knowledge framework for adaptive medical workflow.

    PubMed

    Dang, Jiangbo; Hedayati, Amir; Hampel, Ken; Toklu, Candemir

    2008-10-01

    As emerging technologies, semantic Web and SOA (Service-Oriented Architecture) allow BPMS (Business Process Management System) to automate business processes that can be described as services, which in turn can be used to wrap existing enterprise applications. BPMS provides tools and methodologies to compose Web services that can be executed as business processes and monitored by BPM (Business Process Management) consoles. Ontologies are a formal declarative knowledge representation model. It provides a foundation upon which machine understandable knowledge can be obtained, and as a result, it makes machine intelligence possible. Healthcare systems can adopt these technologies to make them ubiquitous, adaptive, and intelligent, and then serve patients better. This paper presents an ontological knowledge framework that covers healthcare domains that a hospital encompasses-from the medical or administrative tasks, to hospital assets, medical insurances, patient records, drugs, and regulations. Therefore, our ontology makes our vision of personalized healthcare possible by capturing all necessary knowledge for a complex personalized healthcare scenario involving patient care, insurance policies, and drug prescriptions, and compliances. For example, our ontology facilitates a workflow management system to allow users, from physicians to administrative assistants, to manage, even create context-aware new medical workflows and execute them on-the-fly.

  19. Scientific Claims versus Scientific Knowledge.

    ERIC Educational Resources Information Center

    Ramsey, John

    1991-01-01

    Provides activities that help students to understand the importance of the scientific method. The activities include the science of fusion and cold fusion; a group activity that analyzes and interprets the events surrounding cold fusion; and an application research project concerning a current science issue. (ZWH)

  20. Using CyberShake Workflows to Manage Big Seismic Hazard Data on Large-Scale Open-Science HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2015-12-01

    The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and

  1. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    PubMed

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.

  2. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    PubMed

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

  3. Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

    PubMed Central

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

  4. Robustness of spatial micronetworks

    NASA Astrophysics Data System (ADS)

    McAndrew, Thomas C.; Danforth, Christopher M.; Bagrow, James P.

    2015-04-01

    Power lines, roadways, pipelines, and other physical infrastructure are critical to modern society. These structures may be viewed as spatial networks where geographic distances play a role in the functionality and construction cost of links. Traditionally, studies of network robustness have primarily considered the connectedness of large, random networks. Yet for spatial infrastructure, physical distances must also play a role in network robustness. Understanding the robustness of small spatial networks is particularly important with the increasing interest in microgrids, i.e., small-area distributed power grids that are well suited to using renewable energy resources. We study the random failures of links in small networks where functionality depends on both spatial distance and topological connectedness. By introducing a percolation model where the failure of each link is proportional to its spatial length, we find that when failures depend on spatial distances, networks are more fragile than expected. Accounting for spatial effects in both construction and robustness is important for designing efficient microgrids and other network infrastructure.

  5. A Label-free Selected Reaction Monitoring Workflow Identifies a Subset of Pregnancy Specific Glycoproteins as Potential Predictive Markers of Early-onset Pre-eclampsia*

    PubMed Central

    Blankley, Richard T.; Fisher, Christal; Westwood, Melissa; North, Robyn; Baker, Philip N.; Walker, Michael J.; Williamson, Andrew; Whetton, Anthony D.; Lin, Wanchang; McCowan, Lesley; Roberts, Claire T.; Cooper, Garth J. S.; Unwin, Richard D.; Myers, Jenny E.

    2013-01-01

    Pre-eclampsia (PE) is a serious complication of pregnancy with potentially life threatening consequences for both mother and baby. Presently there is no test with the required performance to predict which healthy first-time mothers will go on to develop PE. The high specificity, sensitivity, and multiplexed nature of selected reaction monitoring holds great potential as a tool for the verification and validation of putative candidate biomarkersfor disease states. Realization of this potential involves establishing a high throughput, cost effective, reproducible sample preparation workflow. We have developed a semi-automated HPLC-based sample preparation workflow before a label-free selected reaction monitoring approach. This workflow has been applied to the search for novel predictive biomarkers for PE. To discover novel candidate biomarkers for PE, we used isobaric tagging to identify several potential biomarker proteins in plasma obtained at 15 weeks gestation from nulliparous women who later developed PE compared with pregnant women who remained healthy. Such a study generates a number of “candidate” biomarkers that require further testing in larger patient cohorts. As proof-of-principle, two of these proteins were taken forward for verification in a 100 women (58 PE, 42 controls) using label-free SRM. We obtained reproducible protein quantitation across the 100 samples and demonstrated significant changes in protein levels, even with as little as 20% change in protein concentration. The SRM data correlated with a commercial ELISA, suggesting that this is a robust workflow suitable for rapid, affordable, label-free verification of which candidate biomarkers should be taken forward for thorough investigation. A subset of pregnancy-specific glycoproteins (PSGs) had value as novel predictive markers for PE. PMID:23897580

  6. Provenance for Runtime Workflow Steering and Validation in Computational Seismology

    NASA Astrophysics Data System (ADS)

    Spinuso, A.; Krischer, L.; Krause, A.; Filgueira, R.; Magnoni, F.; Muraleedharan, V.; David, M.

    2014-12-01

    Provenance systems may be offered by modern workflow engines to collect metadata about the data transformations at runtime. If combined with effective visualisation and monitoring interfaces, these provenance recordings can speed up the validation process of an experiment, suggesting interactive or automated interventions with immediate effects on the lifecycle of a workflow run. For instance, in the field of computational seismology, if we consider research applications performing long lasting cross correlation analysis and high resolution simulations, the immediate notification of logical errors and the rapid access to intermediate results, can produce reactions which foster a more efficient progress of the research. These applications are often executed in secured and sophisticated HPC and HTC infrastructures, highlighting the need for a comprehensive framework that facilitates the extraction of fine grained provenance and the development of provenance aware components, leveraging the scalability characteristics of the adopted workflow engines, whose enactment can be mapped to different technologies (MPI, Storm clusters, etc). This work looks at the adoption of W3C-PROV concepts and data model within a user driven processing and validation framework for seismic data, supporting also computational and data management steering. Validation needs to balance automation with user intervention, considering the scientist as part of the archiving process. Therefore, the provenance data is enriched with community-specific metadata vocabularies and control messages, making an experiment reproducible and its description consistent with the community understandings. Moreover, it can contain user defined terms and annotations. The current implementation of the system is supported by the EU-Funded VERCE (http://verce.eu). It provides, as well as the provenance generation mechanisms, a prototypal browser-based user interface and a web API built on top of a NoSQL storage

  7. AGILE/GRID Science Alert Monitoring System: The Workflow and the Crab Flare Case

    NASA Astrophysics Data System (ADS)

    Bulgarelli, A.; Trifoglio, M.; Gianotti, F.; Tavani, M.; Conforti, V.; Parmiggiani, N.

    2013-10-01

    During the first five years of the AGILE mission we have observed many gamma-ray transients of Galactic and extragalactic origin. A fast reaction to unexpected transient events is a crucial part of the AGILE monitoring program, because the follow-up of astrophysical transients is a key point for this space mission. We present the workflow and the software developed by the AGILE Team to perform the automatic analysis for the detection of gamma-ray transients. In addition, an App for iPhone will be released enabling the Team to access the monitoring system through mobile phones. In 2010 September the science alert monitoring system presented in this paper recorded a transient phenomena from the Crab Nebula, generating an automated alert sent via email and SMS two hours after the end of an AGILE satellite orbit, i.e. two hours after the Crab flare itself: for this discovery AGILE won the 2012 Bruno Rossi prize. The design of this alert system is maximized to reach the maximum speed, and in this, as in many other cases, AGILE has demonstrated that the reaction speed of the monitoring system is crucial for the scientific return of the mission.

  8. Automated Web-Based Request Mechanism for Workflow Enhancement in an Academic Customer-Focused Biorepository

    PubMed Central

    Ryan, Benjamin J.; Brink, Amy; Holtschlag, Victoria L.

    2012-01-01

    Informatics systems, particularly those that provide capabilities for data storage, specimen tracking, retrieval, and order fulfillment, are critical to the success of biorepositories and other laboratories engaged in translational medical research. A crucial item—one easily overlooked—is an efficient way to receive and process investigator-initiated requests. A successful electronic ordering system should allow request processing in a maximally efficient manner, while also allowing streamlined tracking and mining of request data such as turnaround times and numerical categorizations (user groups, funding sources, protocols, and so on). Ideally, an electronic ordering system also facilitates the initial contact between the laboratory and customers, while still allowing for downstream communications and other steps toward scientific partnerships. We describe here the recently established Web-based ordering system for the biorepository at Washington University Medical Center, along with its benefits for workflow, tracking, and customer service. Because of the system's numerous value-added impacts, we think our experience can serve as a good model for other customer-focused biorepositories, especially those currently using manual or non-Web–based request systems. Our lessons learned also apply to the informatics developers who serve such biobanks. PMID:23386921

  9. Automated End-to-End Workflow for Precise and Geo-accurate Reconstructions using Fiducial Markers

    NASA Astrophysics Data System (ADS)

    Rumpler, M.; Daftry, S.; Tscharf, A.; Prettenthaler, R.; Hoppe, C.; Mayer, G.; Bischof, H.

    2014-08-01

    Photogrammetric computer vision systems have been well established in many scientific and commercial fields during the last decades. Recent developments in image-based 3D reconstruction systems in conjunction with the availability of affordable high quality digital consumer grade cameras have resulted in an easy way of creating visually appealing 3D models. However, many of these methods require manual steps in the processing chain and for many photogrammetric applications such as mapping, recurrent topographic surveys or architectural and archaeological 3D documentations, high accuracy in a geo-coordinate system is required which often cannot be guaranteed. Hence, in this paper we present and advocate a fully automated end-to-end workflow for precise and geoaccurate 3D reconstructions using fiducial markers. We integrate an automatic camera calibration and georeferencing method into our image-based reconstruction pipeline based on binary-coded fiducial markers as artificial, individually identifiable landmarks in the scene. Additionally, we facilitate the use of these markers in conjunction with known ground control points (GCP) in the bundle adjustment, and use an online feedback method that allows assessment of the final reconstruction quality in terms of image overlap, ground sampling distance (GSD) and completeness, and thus provides flexibility to adopt the image acquisition strategy already during image recording. An extensive set of experiments is presented which demonstrate the accuracy benefits to obtain a highly accurate and geographically aligned reconstruction with an absolute point position uncertainty of about 1.5 times the ground sampling distance.

  10. Design Principles as a Guide for Constraint Based and Dynamic Modeling: Towards an Integrative Workflow

    PubMed Central

    Sehr, Christiana; Kremling, Andreas; Marin-Sanguino, Alberto

    2015-01-01

    During the last 10 years, systems biology has matured from a fuzzy concept combining omics, mathematical modeling and computers into a scientific field on its own right. In spite of its incredible potential, the multilevel complexity of its objects of study makes it very difficult to establish a reliable connection between data and models. The great number of degrees of freedom often results in situations, where many different models can explain/fit all available datasets. This has resulted in a shift of paradigm from the initially dominant, maybe naive, idea of inferring the system out of a number of datasets to the application of different techniques that reduce the degrees of freedom before any data set is analyzed. There is a wide variety of techniques available, each of them can contribute a piece of the puzzle and include different kinds of experimental information. But the challenge that remains is their meaningful integration. Here we show some theoretical results that enable some of the main modeling approaches to be applied sequentially in a complementary manner, and how this workflow can benefit from evolutionary reasoning to keep the complexity of the problem in check. As a proof of concept, we show how the synergies between these modeling techniques can provide insight into some well studied problems: Ammonia assimilation in bacteria and an unbranched linear pathway with end-product inhibition. PMID:26501332

  11. Implementation of Electronic Workflow Systems in Higher Education Institutions: Issues and Challenges

    NASA Astrophysics Data System (ADS)

    Cheung, K. S.

    To different extents, electronic workflow systems have been widely used in higher education institutions for administering the daily and routine operations. Whilst workflow automation is advocated for streamlining business processes, there are technical limitations as well as management constraints, especially on process review and re-engineering. During the process review, a big challenge is to make sure that the system would not only meet the business requirements but also improve the process flow. It is important for one to retain the legacy stature while coping with the changes in workflow, but taking into consideration of the needs to accommodate managerial constraints. This paper investigates the issues and challenges in implementing electronic workflow systems in higher education institutions. Different approaches to the process review, workflow design and re-design are discussed.

  12. Optimizing workflow and knowledge in healthcare through innovation.

    PubMed

    Schmitt, Karl-Jürgen

    2004-01-01

    People's desire is to stay healthy during the entire course of their live. Innovations in medicine in care and technology have always contributed significantly to meet this desire as close as possible. Today, healthcare systems are faced with huge additional challenges. The focus of nearly every healthcare debate is on costs. But is this debate target-oriented and does it support the struggle for further enhancing the quality of care? The implementation of IT assisted workflow and knowledge supporting tools throughout the entire healthcare process--prevention to cure--leads to care which would be much more focused on people's needs and efficiency. The information gained from monitoring and wearable devices has to be included to these tools for delivering comprehensive patient information to the point of care. Then the puzzle of the different components in healthcare linked by IT will be complete, and the care process could be continuously optimized in an efficient way.

  13. Cytoscape: the network visualization tool for GenomeSpace workflows.

    PubMed

    Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P

    2014-01-01

    Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537

  14. Optimizing perioperative decision making: improved information for clinical workflow planning.

    PubMed

    Doebbeling, Bradley N; Burton, Matthew M; Wiebke, Eric A; Miller, Spencer; Baxter, Laurence; Miller, Donald; Alvarez, Jorge; Pekny, Joseph

    2012-01-01

    Perioperative care is complex and involves multiple interconnected subsystems. Delayed starts, prolonged cases and overtime are common. Surgical procedures account for 40-70% of hospital revenues and 30-40% of total costs. Most planning and scheduling in healthcare is done without modern planning tools, which have potential for improving access by assisting in operations planning support. We identified key planning scenarios of interest to perioperative leaders, in order to examine the feasibility of applying combinatorial optimization software solving some of those planning issues in the operative setting. Perioperative leaders desire a broad range of tools for planning and assessing alternate solutions. Our modeled solutions generated feasible solutions that varied as expected, based on resource and policy assumptions and found better utilization of scarce resources. Combinatorial optimization modeling can effectively evaluate alternatives to support key decisions for planning clinical workflow and improving care efficiency and satisfaction.

  15. Mayavi2: 3D Scientific Data Visualization and Plotting

    NASA Astrophysics Data System (ADS)

    Ramachandran, Prabhu; Varoquaux, Gaël

    2012-05-01

    Mayavi is an open-source, general-purpose, 3D scientific visualization package. It seeks to provide easy and interactive tools for data visualization that fit with the scientific user's workflow. Mayavi provides several entry points: a full-blown interactive application; a Python library with both a MATLAB-like interface focused on easy scripting and a feature-rich object hierarchy; widgets associated with these objects for assembling in a domain-specific application, and plugins that work with a general purpose application-building framework.

  16. Provenance of things - describing geochemistry observation workflows using PROV-O

    NASA Astrophysics Data System (ADS)

    Cox, S. J. D.; Car, N. J.

    2015-12-01

    Geochemistry observations typically follow a complex preparation process after sample retrieval from the field. Description of these are required to allow readers and other data users to assess the reliability of the data produced, and to ensure reproducibility. While laboratory notebooks are used for private record-keeping, and laboratory information systems (LIMS) on a facility basis, this data is not generally published, and there are no standard formats for transfer. And while there is some standardization of workflows, this is often scoped to a lab, or an instrument. New procedures and workflows are being developed continually - in fact this is a key expectation in the development of the science. Thus formalization of the description of sample preparation and observations must be both rigorous and flexible. We have been exploring the use of the W3C Provenance model (PROV) to capture complete traces, including both the real world things and the data generated. PROV has a core data model that distinguishes between entities, agents and activities involved in producing a piece of data or thing in the world. While the design of PROV was primarily conditioned by stories concerning information resources, application is not restricted to the production of digital or information assets. PROV allowing a comprehensive trace of predecessor entities and transformations at any level of detail. In this paper we demonstrate the use of PROV for describing specimens managed for scientific observations. Two examples are considered: a geological sample which undergoes a typical preparation process for measurements of the concentration of a particular chemical substance, and the collection, taxonomic classification and eventual publication of an insect specimen. PROV enables the material that goes into the instrument to be linked back to the sample retrieved in the field. This complements the IGSN system, which focuses on registration of field sample identity to support the

  17. Incorporating Geoscience, Field Data Collection Workflows into Software Developed for Mobile Devices

    NASA Astrophysics Data System (ADS)

    Vieira, D. A.; Mookerjee, M.; Matsa, S.

    2014-12-01

    Modern geological sciences depend heavily on investigating the natural world in situ, i.e., within "the field," as well as managing data collections in the light of evolving advances in technology and cyberinfrastructure. To accelerate the rate of scientific discovery, we need to expedite data collection and management in such a way so as to not interfere with the typical geoscience, field workflow. To this end, we suggest replacing traditional analog methods of data collection, such as the standard field notebook and compass, with primary digital data collection applications. While some field data collecting apps exist for both the iOS and android operating systems, they do not communicate with each other in an organized data collection effort. We propose the development of a mobile app that coordinates the collection of GPS, photographic, and orientation data, along with field observations. Additionally, this application should be able to pair with other devices in order to incorporate other sensor data. In this way, the app can generate a single file that includes all field data elements and can be synced to the appropriate database with ease and efficiency. We present here a prototype application that attempts to illustrate how digital collection can be integrated into a "typical" geoscience, field workflow. The purpose of our app is to get field scientists to think about specific requirements for the development of a unified field data collection application. One fundamental step in the development of such an app is the community-based, decision-making process of adopting certain data/metadata standards and conventions. In August of 2014, on a four-day field trip to Yosemite National Park and Owens Valley, we engaged a group of field-based geologists and computer/cognitive scientists to start building a community consensus on these cyberinfrastructure-related issues. Discussing the unique problems of field data recording, conventions, storage, representation

  18. Big Data Architectures for Operationalized Seismic and Subsurface Monitoring and Decision Support Workflows

    NASA Astrophysics Data System (ADS)

    Irving, D. H.; Rasheed, M.; Hillman, C.; O'Doherty, N.

    2012-12-01

    Oilfield management is moving to a more operational footing with near-realtime seismic and sensor monitoring governing drilling, fluid injection and hydrocarbon extraction workflows within safety, productivity and profitability constraints. To date, the geoscientific analytical architectures employed are configured for large volumes of data, computational power or analytical latency and compromises in system design must be made to achieve all three aspects. These challenges are encapsulated by the phrase 'Big Data' which has been employed for over a decade in the IT industry to describe the challenges presented by data sets that are too large, volatile and diverse for existing computational architectures and paradigms. We present a data-centric architecture developed to support a geoscientific and geotechnical workflow whereby: ●scientific insight is continuously applied to fresh data ●insights and derived information are incorporated into engineering and operational decisions ●data governance and provenance are routine within a broader data management framework Strategic decision support systems in large infrastructure projects such as oilfields are typically relational data environments; data modelling is pervasive across analytical functions. However, subsurface data and models are typically non-relational (i.e. file-based) in the form of large volumes of seismic imaging data or rapid streams of sensor feeds and are analysed and interpreted using niche applications. The key architectural challenge is to move data and insight from a non-relational to a relational, or structured, data environment for faster and more integrated analytics. We describe how a blend of MapReduce and relational database technologies can be applied in geoscientific decision support, and the strengths and weaknesses of each in such an analytical ecosystem. In addition we discuss hybrid technologies that use aspects of both and translational technologies for moving data and analytics

  19. Robust acoustic object detection

    NASA Astrophysics Data System (ADS)

    Amit, Yali; Koloydenko, Alexey; Niyogi, Partha

    2005-10-01

    We consider a novel approach to the problem of detecting phonological objects like phonemes, syllables, or words, directly from the speech signal. We begin by defining local features in the time-frequency plane with built in robustness to intensity variations and time warping. Global templates of phonological objects correspond to the coincidence in time and frequency of patterns of the local features. These global templates are constructed by using the statistics of the local features in a principled way. The templates have clear phonetic interpretability, are easily adaptable, have built in invariances, and display considerable robustness in the face of additive noise and clutter from competing speakers. We provide a detailed evaluation of the performance of some diphone detectors and a word detector based on this approach. We also perform some phonetic classification experiments based on the edge-based features suggested here.

  20. Doubly robust survival trees.

    PubMed

    Steingrimsson, Jon Arni; Diao, Liqun; Molinaro, Annette M; Strawderman, Robert L

    2016-09-10

    Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing 'loss based' recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new 'doubly robust' extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27037609

  1. The standard-based open workflow system in GeoBrain (Invited)

    NASA Astrophysics Data System (ADS)

    Di, L.; Yu, G.; Zhao, P.; Deng, M.

    2013-12-01

    GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial

  2. Robust reinforcement learning.

    PubMed

    Morimoto, Jun; Doya, Kenji

    2005-02-01

    This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.

  3. myExperiment: a repository and social network for the sharing of bioinformatics workflows

    PubMed Central

    Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David

    2010-01-01

    myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org. PMID:20501605

  4. Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.

    PubMed

    Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher

    2012-12-01

    As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.

  5. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64

    PubMed Central

    2015-01-01

    displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system. PMID:26618079

  6. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

    PubMed

    Winkler, Robert

    2015-01-01

    displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system. PMID:26618079

  7. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

    PubMed

    Winkler, Robert

    2015-01-01

    displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.

  8. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  9. Proteomics of mouse liver microsomes: performance of different protein separation workflows for LC-MS/MS.

    PubMed

    Zgoda, Victor G; Moshkovskii, Sergei A; Ponomarenko, Elena A; Andreewski, Timofey V; Kopylov, Arthur T; Tikhonova, Olga V; Melnik, Stanislav A; Lisitsa, Andrei V; Archakov, Alexander I

    2009-08-01

    The mouse liver microsome proteome was investigated using ion trap MS combined with three separation workflows including SDS-PAGE followed by reverse-phase LC of in-gel protein digestions (519 proteins identified); 2-D LC of protein digestion (1410 proteins); whole protein separation on mRP heat-stable column followed by 2-D LC of protein digestions from each fraction (3-D LC; 3703 proteins). The higher number of proteins identified in the workflow corresponded to the lesser percentage of run-to-run reproducibility. Gel-based method yielded a number of predicted membrane proteins similar to LC-based workflows.

  10. High-volume workflow management in the ITN/FBI system

    NASA Astrophysics Data System (ADS)

    Paulson, Thomas L.

    1997-02-01

    The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.

  11. Robust Systems Test Framework

    2003-01-01

    The Robust Systems Test Framework (RSTF) provides a means of specifying and running test programs on various computation platforms. RSTF provides a level of specification above standard scripting languages. During a set of runs, standard timing information is collected. The RSTF specification can also gather job-specific information, and can include ways to classify test outcomes. All results and scripts can be stored into and retrieved from an SQL database for later data analysis. RSTF alsomore » provides operations for managing the script and result files, and for compiling applications and gathering compilation information such as optimization flags.« less

  12. Robust quantum spatial search

    NASA Astrophysics Data System (ADS)

    Tulsi, Avatar

    2016-07-01

    Quantum spatial search has been widely studied with most of the study focusing on quantum walk algorithms. We show that quantum walk algorithms are extremely sensitive to systematic errors. We present a recursive algorithm which offers significant robustness to certain systematic errors. To search N items, our recursive algorithm can tolerate errors of size O(1{/}√{ln N}) which is exponentially better than quantum walk algorithms for which tolerable error size is only O(ln N{/}√{N}). Also, our algorithm does not need any ancilla qubit. Thus our algorithm is much easier to implement experimentally compared to quantum walk algorithms.

  13. Robust Kriged Kalman Filtering

    SciTech Connect

    Baingana, Brian; Dall'Anese, Emiliano; Mateos, Gonzalo; Giannakis, Georgios B.

    2015-11-11

    Although the kriged Kalman filter (KKF) has well-documented merits for prediction of spatial-temporal processes, its performance degrades in the presence of outliers due to anomalous events, or measurement equipment failures. This paper proposes a robust KKF model that explicitly accounts for presence of measurement outliers. Exploiting outlier sparsity, a novel l1-regularized estimator that jointly predicts the spatial-temporal process at unmonitored locations, while identifying measurement outliers is put forth. Numerical tests are conducted on a synthetic Internet protocol (IP) network, and real transformer load data. Test results corroborate the effectiveness of the novel estimator in joint spatial prediction and outlier identification.

  14. Robust Systems Test Framework

    SciTech Connect

    Ballance, Robert A.

    2003-01-01

    The Robust Systems Test Framework (RSTF) provides a means of specifying and running test programs on various computation platforms. RSTF provides a level of specification above standard scripting languages. During a set of runs, standard timing information is collected. The RSTF specification can also gather job-specific information, and can include ways to classify test outcomes. All results and scripts can be stored into and retrieved from an SQL database for later data analysis. RSTF also provides operations for managing the script and result files, and for compiling applications and gathering compilation information such as optimization flags.

  15. Robust telescope scheduling

    NASA Technical Reports Server (NTRS)

    Swanson, Keith; Bresina, John; Drummond, Mark

    1994-01-01

    This paper presents a technique for building robust telescope schedules that tend not to break. The technique is called Just-In-Case (JIC) scheduling and it implements the common sense idea of being prepared for likely errors, just in case they should occur. The JIC algorithm analyzes a given schedule, determines where it is likely to break, reinvokes a scheduler to generate a contingent schedule for each highly probable break case, and produces a 'multiply contingent' schedule. The technique was developed for an automatic telescope scheduling problem, and the paper presents empirical results showing that Just-In-Case scheduling performs extremely well for this problem.

  16. Robust Photon Locking

    SciTech Connect

    Bayer, T.; Wollenhaupt, M.; Sarpe-Tudoran, C.; Baumert, T.

    2009-01-16

    We experimentally demonstrate a strong-field coherent control mechanism that combines the advantages of photon locking (PL) and rapid adiabatic passage (RAP). Unlike earlier implementations of PL and RAP by pulse sequences or chirped pulses, we use shaped pulses generated by phase modulation of the spectrum of a femtosecond laser pulse with a generalized phase discontinuity. The novel control scenario is characterized by a high degree of robustness achieved via adiabatic preparation of a state of maximum coherence. Subsequent phase control allows for efficient switching among different target states. We investigate both properties by photoelectron spectroscopy on potassium atoms interacting with the intense shaped light field.

  17. Robust control for uncertain structures

    NASA Technical Reports Server (NTRS)

    Douglas, Joel; Athans, Michael

    1991-01-01

    Viewgraphs on robust control for uncertain structures are presented. Topics covered include: robust linear quadratic regulator (RLQR) formulas; mismatched LQR design; RLQR design; interpretations of RLQR design; disturbance rejection; and performance comparisons: RLQR vs. mismatched LQR.

  18. Eddy Covariance Method: Overview of General Guidelines and Conventional Workflow

    NASA Astrophysics Data System (ADS)

    Burba, G. G.; Anderson, D. J.; Amen, J. L.

    2007-12-01

    Atmospheric flux measurements are widely used to estimate water, heat, carbon dioxide and trace gas exchange between the ecosystem and the atmosphere. The Eddy Covariance method is one of the most direct, defensible ways to measure and calculate turbulent fluxes within the atmospheric boundary layer. However, the method is mathematically complex, and requires significant care to set up and process data. These reasons may be why the method is currently used predominantly by micrometeorologists. Modern instruments and software can potentially expand the use of this method beyond micrometeorology and prove valuable for plant physiology, hydrology, biology, ecology, entomology, and other non-micrometeorological areas of research. The main challenge of the method for a non-expert is the complexity of system design, implementation, and processing of the large volume of data. In the past several years, efforts of the flux networks (e.g., FluxNet, Ameriflux, CarboEurope, Fluxnet-Canada, Asiaflux, etc.) have led to noticeable progress in unification of the terminology and general standardization of processing steps. The methodology itself, however, is difficult to unify, because various experimental sites and different purposes of studies dictate different treatments, and site-, measurement- and purpose-specific approaches. Here we present an overview of theory and typical workflow of the Eddy Covariance method in a format specifically designed to (i) familiarize a non-expert with general principles, requirements, applications, and processing steps of the conventional Eddy Covariance technique, (ii) to assist in further understanding the method through more advanced references such as textbooks, network guidelines and journal papers, (iii) to help technicians, students and new researchers in the field deployment of the Eddy Covariance method, and (iv) to assist in its use beyond micrometeorology. The overview is based, to a large degree, on the frequently asked questions

  19. DNA qualification workflow for next generation sequencing of histopathological samples.

    PubMed

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  20. DNA qualification workflow for next generation sequencing of histopathological samples.

    PubMed

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  1. Robust omniphobic surfaces

    PubMed Central

    Tuteja, Anish; Choi, Wonjae; Mabry, Joseph M.; McKinley, Gareth H.; Cohen, Robert E.

    2008-01-01

    Superhydrophobic surfaces display water contact angles greater than 150° in conjunction with low contact angle hysteresis. Microscopic pockets of air trapped beneath the water droplets placed on these surfaces lead to a composite solid-liquid-air interface in thermodynamic equilibrium. Previous experimental and theoretical studies suggest that it may not be possible to form similar fully-equilibrated, composite interfaces with drops of liquids, such as alkanes or alcohols, that possess significantly lower surface tension than water (γlv = 72.1 mN/m). In this work we develop surfaces possessing re-entrant texture that can support strongly metastable composite solid-liquid-air interfaces, even with very low surface tension liquids such as pentane (γlv = 15.7 mN/m). Furthermore, we propose four design parameters that predict the measured contact angles for a liquid droplet on a textured surface, as well as the robustness of the composite interface, based on the properties of the solid surface and the contacting liquid. These design parameters allow us to produce two different families of re-entrant surfaces— randomly-deposited electrospun fiber mats and precisely fabricated microhoodoo surfaces—that can each support a robust composite interface with essentially any liquid. These omniphobic surfaces display contact angles greater than 150° and low contact angle hysteresis with both polar and nonpolar liquids possessing a wide range of surface tensions. PMID:19001270

  2. Development of a user customizable imaging informatics-based intelligent workflow engine system to enhance rehabilitation clinical trials

    NASA Astrophysics Data System (ADS)

    Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent

    2014-03-01

    Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.

  3. A workflow model to analyse pediatric emergency overcrowding.

    PubMed

    Zgaya, Hayfa; Ajmi, Ines; Gammoudi, Lotfi; Hammadi, Slim; Martinot, Alain; Beuscart, Régis; Renard, Jean-Marie

    2014-01-01

    The greatest source of delay in patient flow is the waiting time from the health care request, and especially the bed request to exit from the Pediatric Emergency Department (PED) for hospital admission. It represents 70% of the time that these patients occupied in the PED waiting rooms. Our objective in this study is to identify tension indicators and bottlenecks that contribute to overcrowding. Patient flow mapping through the PED was carried out in a continuous 2 years period from January 2011 to December 2012. Our method is to use the collected real data, basing on accurate visits made in the PED of the Regional University Hospital Center (CHRU) of Lille (France), in order to construct an accurate and complete representation of the PED processes. The result of this representation is a Workflow model of the patient journey in the PED representing most faithfully possible the reality of the PED of CHRU of Lille. This model allowed us to identify sources of delay in patient flow and aspects of the PED activity that could be improved. It must be enough retailed to produce an analysis allowing to identify the dysfunctions of the PED and also to propose and to estimate prevention indicators of tensions. Our survey is integrated into the French National Research Agency project, titled: "Hospital: optimization, simulation and avoidance of strain" (ANR HOST).

  4. Sensor-based architecture for medical imaging workflow analysis.

    PubMed

    Silva, Luís A Bastião; Campos, Samuel; Costa, Carlos; Oliveira, José Luis

    2014-08-01

    The growing use of computer systems in medical institutions has been generating a tremendous quantity of data. While these data have a critical role in assisting physicians in the clinical practice, the information that can be extracted goes far beyond this utilization. This article proposes a platform capable of assembling multiple data sources within a medical imaging laboratory, through a network of intelligent sensors. The proposed integration framework follows a SOA hybrid architecture based on an information sensor network, capable of collecting information from several sources in medical imaging laboratories. Currently, the system supports three types of sensors: DICOM repository meta-data, network workflows and examination reports. Each sensor is responsible for converting unstructured information from data sources into a common format that will then be semantically indexed in the framework engine. The platform was deployed in the Cardiology department of a central hospital, allowing identification of processes' characteristics and users' behaviours that were unknown before the utilization of this solution. PMID:24957389

  5. Clinical workflow for personalized foot pressure ulcer prevention.

    PubMed

    Bucki, M; Luboz, V; Perrier, A; Champion, E; Diot, B; Vuillerme, N; Payan, Y

    2016-09-01

    Foot pressure ulcers are a common complication of diabetes because of patient's lack of sensitivity due to neuropathy. Deep pressure ulcers appear internally when pressures applied on the foot create high internal strains nearby bony structures. Monitoring tissue strains in persons with diabetes is therefore important for an efficient prevention. We propose to use personalized biomechanical foot models to assess strains within the foot and to determine the risk of ulcer formation. Our workflow generates a foot model adapted to a patient's morphology by deforming an atlas model to conform it to the contours of segmented medical images of the patient's foot. Our biomechanical model is composed of rigid bodies for the bones, joined by ligaments and muscles, and a finite element mesh representing the soft tissues. Using our registration algorithm to conform three datasets, three new patient models were created. After applying a pressure load below these foot models, the Von Mises equivalent strains and "cluster volumes" (i.e. volumes of contiguous elements with strains above a given threshold) were measured within eight functionally meaningful foot regions. The results show the variability of both location and strain values among the three considered patients. This study also confirms that the anatomy of the foot has an influence on the risk of pressure ulcer. PMID:27212210

  6. A workflow model to analyse pediatric emergency overcrowding.

    PubMed

    Zgaya, Hayfa; Ajmi, Ines; Gammoudi, Lotfi; Hammadi, Slim; Martinot, Alain; Beuscart, Régis; Renard, Jean-Marie

    2014-01-01

    The greatest source of delay in patient flow is the waiting time from the health care request, and especially the bed request to exit from the Pediatric Emergency Department (PED) for hospital admission. It represents 70% of the time that these patients occupied in the PED waiting rooms. Our objective in this study is to identify tension indicators and bottlenecks that contribute to overcrowding. Patient flow mapping through the PED was carried out in a continuous 2 years period from January 2011 to December 2012. Our method is to use the collected real data, basing on accurate visits made in the PED of the Regional University Hospital Center (CHRU) of Lille (France), in order to construct an accurate and complete representation of the PED processes. The result of this representation is a Workflow model of the patient journey in the PED representing most faithfully possible the reality of the PED of CHRU of Lille. This model allowed us to identify sources of delay in patient flow and aspects of the PED activity that could be improved. It must be enough retailed to produce an analysis allowing to identify the dysfunctions of the PED and also to propose and to estimate prevention indicators of tensions. Our survey is integrated into the French National Research Agency project, titled: "Hospital: optimization, simulation and avoidance of strain" (ANR HOST). PMID:25160202

  7. AI and workflow automation: The prototype electronic purchase request system

    NASA Technical Reports Server (NTRS)

    Compton, Michael M.; Wolfe, Shawn R.

    1994-01-01

    Automating 'paper' workflow processes with electronic forms and email can dramatically improve the efficiency of those processes. However, applications that involve complex forms that are used for a variety of purposes or that require numerous and varied approvals often require additional software tools to ensure that (1) the electronic form is correctly and completely filled out, and (2) the form is routed to the proper individuals and organizations for approval. The prototype electronic purchase request (PEPR) system, which has been in pilot use at NASA Ames Research Center since December 1993, seamlessly links a commercial electronics forms package and a CLIPS-based knowledge system that first ensures that electronic forms are correct and complete, and then generates an 'electronic routing slip' that is used to route the form to the people who must sign it. The PEPR validation module is context-sensitive, and can apply different validation rules at each step in the approval process. The PEPR system is form-independent, and has been applied to several different types of forms. The system employs a version of CLIPS that has been extended to support AppleScript, a recently-released scripting language for the Macintosh. This 'scriptability' provides both a transparent, flexible interface between the two programs and a means by which a single copy of the knowledge base can be utilized by numerous remote users.

  8. Robust reflective pupil slicing technology

    NASA Astrophysics Data System (ADS)

    Meade, Jeffrey T.; Behr, Bradford B.; Cenko, Andrew T.; Hajian, Arsen R.

    2014-07-01

    Tornado Spectral Systems (TSS) has developed the High Throughput Virtual Slit (HTVSTM), robust all-reflective pupil slicing technology capable of replacing the slit in research-, commercial- and MIL-SPEC-grade spectrometer systems. In the simplest configuration, the HTVS allows optical designers to remove the lossy slit from pointsource spectrometers and widen the input slit of long-slit spectrometers, greatly increasing throughput without loss of spectral resolution or cross-dispersion information. The HTVS works by transferring etendue between image plane axes but operating in the pupil domain rather than at a focal plane. While useful for other technologies, this is especially relevant for spectroscopic applications by performing the same spectral narrowing as a slit without throwing away light on the slit aperture. HTVS can be implemented in all-reflective designs and only requires a small number of reflections for significant spectral resolution enhancement-HTVS systems can be efficiently implemented in most wavelength regions. The etendueshifting operation also provides smooth scaling with input spot/image size without requiring reconfiguration for different targets (such as different seeing disk diameters or different fiber core sizes). Like most slicing technologies, HTVS provides throughput increases of several times without resolution loss over equivalent slitbased designs. HTVS technology enables robust slit replacement in point-source spectrometer systems. By virtue of pupilspace operation this technology has several advantages over comparable image-space slicer technology, including the ability to adapt gracefully and linearly to changing source size and better vertical packing of the flux distribution. Additionally, this technology can be implemented with large slicing factors in both fast and slow beams and can easily scale from large, room-sized spectrometers through to small, telescope-mounted devices. Finally, this same technology is directly

  9. A framework for service enterprise workflow simulation with multi-agents cooperation

    NASA Astrophysics Data System (ADS)

    Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun

    2013-11-01

    Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.

  10. Task-driven equipment inspection system based on safe workflow model

    NASA Astrophysics Data System (ADS)

    Guo, Xinyou; Liu, Yangguang

    2010-12-01

    An equipment inspection system is one that contains a number of equipment queues served in cyclic order. In order to satisfy multi-task scheduling and multi-task combination requirements for equipment inspection system, we propose a model based on inspection workflow in this paper. On the one hand, the model organizes all kinds of equipments according to inspection workflow, elemental work units according to inspection tasks, combination elements according to the task defined by users. We proposed a 3-dimensional workflow model for equipments inspection system including organization sub-model, process sub-model and data sub-model. On the other hand, the model is based on the security authorization which defined by relation between roles, tasks, pre-defined business workflows and inspection data. The system based on proposed framework is safe and efficient. Our implement shows that the system is easy to operate and manage according to the basic performance.

  11. Sample Prep, Workflow Automation and Nucleic Acid Fractionation for Next Generation Sequencing

    SciTech Connect

    Roskey, Mark

    2010-06-03

    Mark Roskey of Caliper LifeSciences discusses how the company's technologies fit into the next generation sequencing workflow on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  12. Visual compression of workflow visualizations with automated detection of macro motifs.

    PubMed

    Maguire, Eamonn; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Davies, Jim; Chen, Min

    2013-12-01

    This paper is concerned with the creation of 'macros' in workflow visualization as a support tool to increase the efficiency of data curation tasks. We propose computation of candidate macros based on their usage in large collections of workflows in data repositories. We describe an efficient algorithm for extracting macro motifs from workflow graphs. We discovered that the state transition information, used to identify macro candidates, characterizes the structural pattern of the macro and can be harnessed as part of the visual design of the corresponding macro glyph. This facilitates partial automation and consistency in glyph design applicable to a large set of macro glyphs. We tested this approach against a repository of biological data holding some 9,670 workflows and found that the algorithmically generated candidate macros are in keeping with domain expert expectations.

  13. Workflow Concerns and Workarounds of Readers in an Urban Safety Net Teleretinal Screening Study

    PubMed Central

    Fish, Allison; George, Sheba; Terrien, Elizabeth; Eccles, Alicia; Baker, Richard; Ogunyemi, Omolola

    2011-01-01

    Telemedicine holds great promise for increased access to specialty care services for safety net clinic patients. However, the adoption of these technologies is not a seamless transition for clinicians working in resource-poor settings. Previous research has analyzed workflow issues that arise in primary care settings when adopting telehealth tools but has not examined the unique workflow challenges facing specialists who provide assessments to safety net clinics. Findings are presented from a case study that employed qualitative methodologies as part of an assessment of a teleretinal screening program in Los Angeles urban safety net clinics. The program utilizes external ophthalmologists to perform retinal readings. The case study provides insights into how difficulties that arise in reader workflow are resolved and identifies unique factors requiring consideration when highly trained specialists perform teleretinal readings. The discussion outlines important issues to address when developing telehealth workflow protocols for the safety net, specifically, and their broader applicability in telemedicine. PMID:22195095

  14. Re-engineering opportunities in clinical research using workflow analysis in community practice settings.

    PubMed

    Khan, Sharib A; Kukafka, Rita; Bigger, J Thomas; Johnson, Stephen B

    2008-11-06

    In this paper we examine frequently performed clinical research activities with the objective of identifying aspects of workflow that could be amenable to informatics-based re-engineering. This paper is part of a series of studies under the NIH Roadmap initiative, which examines workflow of clinical research in community practices. We describe three common work activities, detailing the main actors involved, the tools used and the challenges faced. These activities illustrate inefficiencies in the clinical research workflow which include: a) lack of supporting tools to perform routine work activities, b) redundancy, low reuse of data and poor interoperability between systems and c) the fragmented and distributed nature of the workflow. We identify opportunities for re-engineering at both a micro (activity) and macro level (organization).

  15. A balanced evaluation perspective: picture archiving and communication system impacts on hospital workflow.

    PubMed

    van de Wetering, Rogier; Batenburg, Ronald; Versendaal, Johan; Lederman, Reeva; Firth, Lucy

    2006-01-01

    Around the world, hospitals are faced with both budget and regulatory pressures, forcing them to re-examine the way clinical practice is carried out. Proposed technologies that provide workflow enhancements include Picture Archiving and Communications Systems (PACS); however, is PACS really effective in improving hospital workflow and the flow onto patient care, and how should this be evaluated? An acknowledged and successful approach for organizational evaluation is the Balanced Scorecard (BSC), providing the fundamental features for assessing organizations from various perspectives. In this research, the impact of PACS on the workflow of a large public hospital in Melbourne, Australia, is examined using an adapted version of the BSC. Empirically, this model was applied as an evaluation instrument through a series of in-depth interviews with PACS users. Results show that PACS did improve hospital workflow considerably and that the organizational alignment of PACS in hospitals is an important critical success factor. PMID:16763932

  16. A mixed methods approach for measuring the impact of delivery-centric interventions on clinician workflow.

    PubMed

    Cady, Rhonda G; Finkelstein, Stanley M

    2012-01-01

    Health interventions vary widely. Pharmaceuticals, medical devices and wellness promotion are defined as 'outcome-centric.' They are implemented by clinicians for the use and benefit of consumers, and intervention effectiveness is measured by a change in health outcome. Electronic health records, computerized physician order entry systems and telehealth technologies are defined as 'delivery-centric.' They are implemented by organizations for use by clinicians to manage and facilitate consumer health, and the impact of these interventions on clinician workflow has become increasingly important. The methodological framework introduced in this paper uses a two-phase sequential mixed methods design that qualitatively explores clinician workflow before and after implementation of a delivery-centric intervention, and uses this information to quantitatively measure changes to workflow activities. The mixed methods protocol provides a standardized approach for understanding and determining the impact of delivery-centric interventions on clinician workflow. PMID:23304393

  17. Evolving Robust Gene Regulatory Networks

    PubMed Central

    Noman, Nasimul; Monjo, Taku; Moscato, Pablo; Iba, Hitoshi

    2015-01-01

    Design and implementation of robust network modules is essential for construction of complex biological systems through hierarchical assembly of ‘parts’ and ‘devices’. The robustness of gene regulatory networks (GRNs) is ascribed chiefly to the underlying topology. The automatic designing capability of GRN topology that can exhibit robust behavior can dramatically change the current practice in synthetic biology. A recent study shows that Darwinian evolution can gradually develop higher topological robustness. Subsequently, this work presents an evolutionary algorithm that simulates natural evolution in silico, for identifying network topologies that are robust to perturbations. We present a Monte Carlo based method for quantifying topological robustness and designed a fitness approximation approach for efficient calculation of topological robustness which is computationally very intensive. The proposed framework was verified using two classic GRN behaviors: oscillation and bistability, although the framework is generalized for evolving other types of responses. The algorithm identified robust GRN architectures which were verified using different analysis and comparison. Analysis of the results also shed light on the relationship among robustness, cooperativity and complexity. This study also shows that nature has already evolved very robust architectures for its crucial systems; hence simulation of this natural process can be very valuable for designing robust biological systems. PMID:25616055

  18. BioFlow: a web based workflow management software for design and execution of genomics pipelines

    PubMed Central

    2014-01-01

    Background Bioinformatics data analysis is usually done sequentially by chaining together multiple tools. These are created by writing scripts and tracking the inputs and outputs of all stages. Writing such scripts require programming skills. Executing multiple pipelines in parallel and keeping track of all the generated files is difficult and error prone. Checking results and task completion requires users to remotely login to their servers and run commands to identify process status. Users would benefit from a web-based tool that allows creation and execution of pipelines remotely. The tool should also keep track of all the files generated and maintain a history of user activities. Results A software tool for building and executing workflows is described here. The individual tools in the workflows can be any command line executable or script. The software has an intuitive mechanism for adding new tools to be used in workflows. It contains a workflow designer where workflows can be creating by visually connecting various components. Workflows are executed by job runners. The outputs and the job history are saved. The tool is web based software tool and all actions can be performed remotely. Conclusions Users without scripting knowledge can utilize the tool to build pipelines for executing tasks. Pipelines can be modeled as workflows that are reusable. BioFlow enables users to easily add new tools to the database. The workflows can be created and executed remotely. A number of parallel jobs can be easily controlled. Distributed execution is possible by running multiple instances of the application. Any number of tasks can be executed and the output will be stored making it is easy to correlate the outputs to the jobs executed.

  19. Robustness in Digital Hardware

    NASA Astrophysics Data System (ADS)

    Woods, Roger; Lightbody, Gaye

    The growth in electronics has probably been the equivalent of the Industrial Revolution in the past century in terms of how much it has transformed our daily lives. There is a great dependency on technology whether it is in the devices that control travel (e.g., in aircraft or cars), our entertainment and communication systems, or our interaction with money, which has been empowered by the onset of Internet shopping and banking. Despite this reliance, there is still a danger that at some stage devices will fail within the equipment's lifetime. The purpose of this chapter is to look at the factors causing failure and address possible measures to improve robustness in digital hardware technology and specifically chip technology, giving a long-term forecast that will not reassure the reader!

  20. Robust springback compensation

    NASA Astrophysics Data System (ADS)

    Carleer, Bart; Grimm, Peter

    2013-12-01

    Springback simulation and springback compensation are more and more applied in productive use of die engineering. In order to successfully compensate a tool accurate springback results are needed as well as an effective compensation approach. In this paper a methodology has been introduce in order to effectively compensate tools. First step is the full process simulation meaning that not only the drawing operation will be simulated but also all secondary operations like trimming and flanging. Second will be the verification whether the process is robust meaning that it obtains repeatable results. In order to effectively compensate a minimum clamping concept will be defined. Once these preconditions are fulfilled the tools can be compensated effectively.

  1. An end-to-end workflow for engineering of biological networks from high-level specifications.

    PubMed

    Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

    2012-08-17

    We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells. PMID:23651286

  2. Quantitative ethnographic study of physician workflow and interactions with electronic health record systems

    PubMed Central

    Asan, Onur; Chiou, Erin; Montague, Enid

    2014-01-01

    This study explores the relationship between primary care physicians’ interactions with health information technology and primary care workflow. Clinical encounters were recorded with high-resolution video cameras to capture physicians’ workflow and interaction with two objects of interest, the electronic health record (EHR) system, and their patient. To analyze the data, a coding scheme was developed based on a validated list of primary care tasks to define the presence or absence of a task, the time spent on each task, and the sequence of tasks. Results revealed divergent workflows and significant differences between physicians’ EHR use surrounding common workflow tasks: gathering information, documenting information, and recommend/discuss treatment options. These differences suggest impacts of EHR use on primary care workflow, and capture types of workflows that can be used to inform future studies with larger sample sizes for more effective designs of EHR systems in primary care clinics. Future research on this topic and design strategies for effective health information technology in primary care are discussed. PMID:26279597

  3. Digitization workflows for flat sheets and packets of plants, algae, and fungi.

    PubMed

    Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E; Rabeler, Richard K; Allard, Dorothy; Brown, Herrick; Carter, J Richard; Denslow, Michael W; Ellwood, Elizabeth R; Germain-Aubrey, Charlotte C; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R; Legler, Ben; Marchant, D Blaine; Marsico, Travis D; Morris, Ashley B; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S; Watson, Kimberly; Weeks, Andrea; Mast, Austin R

    2015-09-01

    Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation's (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7-36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256

  4. Digitization workflows for flat sheets and packets of plants, algae, and fungi1

    PubMed Central

    Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.

    2015-01-01

    Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256

  5. Digitization workflows for flat sheets and packets of plants, algae, and fungi.

    PubMed

    Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E; Rabeler, Richard K; Allard, Dorothy; Brown, Herrick; Carter, J Richard; Denslow, Michael W; Ellwood, Elizabeth R; Germain-Aubrey, Charlotte C; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R; Legler, Ben; Marchant, D Blaine; Marsico, Travis D; Morris, Ashley B; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S; Watson, Kimberly; Weeks, Andrea; Mast, Austin R

    2015-09-01

    Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation's (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7-36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged.

  6. An end-to-end workflow for engineering of biological networks from high-level specifications.

    PubMed

    Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

    2012-08-17

    We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

  7. Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery

    SciTech Connect

    Nugent, Peter E.; Simonson, J. Michael

    2011-10-24

    This report is based on the Department of Energy (DOE) Workshop on “Data and Communications in Basic Energy Sciences: Creating a Pathway for Scientific Discovery” that was held at the Bethesda Marriott in Maryland on October 24-25, 2011. The workshop brought together leading researchers from the Basic Energy Sciences (BES) facilities and Advanced Scientific Computing Research (ASCR). The workshop was co-sponsored by these two Offices to identify opportunities and needs for data analysis, ownership, storage, mining, provenance and data transfer at light sources, neutron sources, microscopy centers and other facilities. Their charge was to identify current and anticipated issues in the acquisition, analysis, communication and storage of experimental data that could impact the progress of scientific discovery, ascertain what knowledge, methods and tools are needed to mitigate present and projected shortcomings and to create the foundation for information exchanges and collaboration between ASCR and BES supported researchers and facilities. The workshop was organized in the context of the impending data tsunami that will be produced by DOE’s BES facilities. Current facilities, like SLAC National Accelerator Laboratory’s Linac Coherent Light Source, can produce up to 18 terabytes (TB) per day, while upgraded detectors at Lawrence Berkeley National Laboratory’s Advanced Light Source will generate ~10TB per hour. The expectation is that these rates will increase by over an order of magnitude in the coming decade. The urgency to develop new strategies and methods in order to stay ahead of this deluge and extract the most science from these facilities was recognized by all. The four focus areas addressed in this workshop were: Workflow Management - Experiment to Science: Identifying and managing the data path from experiment to publication. Theory and Algorithms: Recognizing the need for new tools for computation at scale, supporting large data sets and realistic

  8. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

    PubMed Central

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210

  9. The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

    PubMed Central

    Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

    2016-01-01

    Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/

  10. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows

    PubMed Central

    Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.

    2013-01-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784

  11. Text-mining-assisted biocuration workflows in Argo

    PubMed Central

    Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

    2014-01-01

    Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID

  12. Text-mining-assisted biocuration workflows in Argo.

    PubMed

    Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

    2014-01-01

    Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk.

  13. Integrated exploration workflow in the south Middle Magdalena Valley (Colombia)

    NASA Astrophysics Data System (ADS)

    Moretti, Isabelle; Charry, German Rodriguez; Morales, Marcela Mayorga; Mondragon, Juan Carlos

    2010-03-01

    The HC exploration is presently active in the southern part of the Middle Magdalena Valley but only moderate size discoveries have been made up to date. The majority of these discoveries are at shallow depth in the Tertiary section. The structures located in the Valley are faulted anticlines charged by lateral migration from the Cretaceous source rocks that are assumed to be present and mature eastward below the main thrusts and the Guaduas Syncline. Upper Cretaceous reservoirs have also been positively tested. To reduce the risks linked to the exploration of deeper structures below the western thrusts of the Eastern Cordillera, an integrated study was carried out. It includes the acquisition of new seismic data, the integration of all surface and subsurface data within a 3D-geomodel, a quality control of the structural model by restoration and a modeling of the petroleum system (presence and maturity of the Cretaceous source rocks, potential migration pathways). The various steps of this workflow will be presented as well as the main conclusions in term of source rock, deformation phases and timing of the thrust emplacement versus oil maturation and migration. Our data suggest (or confirm) The good potential of the Umir Fm as a source rock. The early (Paleogene) deformation of the Bituima Trigo fault area. The maturity gap within the Cretaceous source rock between the hangingwall and footwall of the Bituima fault that proves an initial offset of Cretaceous burial in the range of 4.5 km between the Upper Cretaceous series westward and the Lower Cretaceous ones eastward of this fault zone. The post Miocene weak reactivation as dextral strike slip of Cretaceous faults such as the San Juan de Rio Seco fault that corresponds to change in the Cretaceous thickness and therefore in the depth of the thrust decollement.

  14. An integrated workflow for characterizing intact phosphoproteins from complex mixtures

    PubMed Central

    Wu, Si; Yang, Feng; Zhao, Rui; Tolić, Nikola; Robinson, Errol W.; Camp, David; Smith, Richard D.; Paša-Tolić, Ljiljana

    2014-01-01

    The phosphorylation of any site on a given protein can affect its activity, degradation rate, ability to dock with other proteins or bind divalent cations, and/or its localization. These effects can operate within the same protein; in fact, multisite phosphorylation is a key mechanism for achieving signal integration in cells. Hence, knowing the overall phosphorylation signature of a protein is essential for understanding the "state" of a cell. However, current technologies to monitor the phosphorylation status of proteins are inefficient at determining the relative stoichiometries of phosphorylation at multiple sites. Here we report a new capability for comprehensive liquid chromatography mass spectrometry (LC/MS) analysis of intact phosphoproteins. The technology platform built upon integrated bottom-up and top-down approach that is facilitated by intact protein reversed-phase (RP)LC concurrently coupled with Fourier transform ion cyclotron resonance (FTICR) MS and fraction collection. As the use of conventional RPLC systems for phosphopeptide identification has proven challenging due to the formation of metal ion complexes at various metal surfaces during LC/MS and ESI-MS analysis, we have developed a “metal-free” RPLC-ESI-MS platform for phosphoprotein characterization. This platform demonstrated a significant sensitivity enhancement for phosphorylated casein proteins enriched from a standard protein mixture and revealed the presence of over 20 casein isoforms arising from genetic variants with varying numbers of phosphorylation sites. The integrated workflow was also applied to an enriched yeast phosphoproteome to evaluate the feasibility of this strategy for characterizing complex biological systems, and revealed ~16% of the detected yeast proteins to have multiple phosphorylation isoforms. Intact protein LC/MS platform for characterization of combinatorial posttranslational modifications (PTMs), with special emphasis on multisite phosphorylation, holds

  15. Scientific approaches to science policy.

    PubMed

    Berg, Jeremy M

    2013-11-01

    The development of robust science policy depends on use of the best available data, rigorous analysis, and inclusion of a wide range of input. While director of the National Institute of General Medical Sciences (NIGMS), I took advantage of available data and emerging tools to analyze training time distribution by new NIGMS grantees, the distribution of the number of publications as a function of total annual National Institutes of Health support per investigator, and the predictive value of peer-review scores on subsequent scientific productivity. Rigorous data analysis should be used to develop new reforms and initiatives that will help build a more sustainable American biomedical research enterprise.

  16. Novel convergence-oriented approach for evaluation and optimization of workflow in single-particle two-dimensional averaging of electron microscope images.

    PubMed

    Moriya, Toshio; Mio, Kazuhiro; Sato, Chikara

    2013-01-01

    Three-dimensional (3D) protein structures facilitate the understanding of their biological functions and provide valuable information for developing medicines. Single-particle analysis (SPA) from electron microscopy (EM) is a structure determination method suitable for macromolecules. To achieve a high resolution using combinations of several SPA software packages, 'workflow' optimization and comparative evaluation by scoring results are essential. Two-dimensional (2D) averaging is a key step for 3D reconstruction. The integrated convergence-evaluation oriented system (IC-EOS) proposed here provides an effective tool for customizing 2D averaging. This assesses the behavior and characteristics of workflows and evaluates the convergence of iteration steps without human intervention. We chose five base measurements for quantifying convergence: resolution, variance, similarity, shift-distance and rotation-angle. Curve fitting to history graphs scored their stability. We call this score 'fluctuation'. The number of particle images discarded from the library and the number of classification groups were examined to see their effects on optimization levels and fluctuation of measurements, allowing the IC-EOS to select the most appropriate workflow for the target. A case study using a bacterial sodium channel and a simulation study using GroEL showed that resolution of 2D averaging improved with relatively stricter particle selection. With fewer groups, resolutions of class averages improved, but similarities between class-averages and their constituent particle images degraded. Fluctuation was useful for selecting adequate conditions, even when achieved values alone were not conclusive. The vote method, using fluctuation, was robust against noise and enabled a decision without exhaustive search trials. Thus, the IC-EOS is a step toward full automation of SPA. PMID:23625506

  17. Robust Nonlinear Neural Codes

    NASA Astrophysics Data System (ADS)

    Yang, Qianli; Pitkow, Xaq

    2015-03-01

    Most interesting natural sensory stimuli are encoded in the brain in a form that can only be decoded nonlinearly. But despite being a core function of the brain, nonlinear population codes are rarely studied and poorly understood. Interestingly, the few existing models of nonlinear codes are inconsistent with known architectural features of the brain. In particular, these codes have information content that scales with the size of the cortical population, even if that violates the data processing inequality by exceeding the amount of information entering the sensory system. Here we provide a valid theory of nonlinear population codes by generalizing recent work on information-limiting correlations in linear population codes. Although these generalized, nonlinear information-limiting correlations bound the performance of any decoder, they also make decoding more robust to suboptimal computation, allowing many suboptimal decoders to achieve nearly the same efficiency as an optimal decoder. Although these correlations are extremely difficult to measure directly, particularly for nonlinear codes, we provide a simple, practical test by which one can use choice-related activity in small populations of neurons to determine whether decoding is suboptimal or optimal and limited by correlated noise. We conclude by describing an example computation in the vestibular system where this theory applies. QY and XP was supported by a grant from the McNair foundation.

  18. The robustness of complex networks

    NASA Astrophysics Data System (ADS)

    Albert, Reka

    2002-03-01

    Many complex networks display a surprising degree of tolerance against errors. For example, organisms and ecosystems exhibit remarkable robustness to large variations in temperature, moisture, and nutrients, and communication networks continue to function despite local failures. This presentation will explore the effects of the network topology on its robust functioning. First, we will consider the topological integrity of several networks under node disruption. Then we will focus on the functional robustness of biological signaling networks, and the decisive role played by the network topology in this robustness.

  19. Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data

    NASA Astrophysics Data System (ADS)

    Nguyen, L.; Chee, T.; Minnis, P.

    2013-12-01

    Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.

  20. Automatic run-time provenance capture for scientific dataset generation

    NASA Astrophysics Data System (ADS)

    Frew, J.; Slaughter, P.

    2008-12-01

    Provenance---the directed graph of a dataset's processing history---is difficult to capture effectively. Human- generated provenance, as narrative metadata, is labor-intensive and thus often incorrect, incomplete, or simply not recorded. Workflow systems capture some provenance implicitly in workflow specifications, but these systems are not ubiquitous or standardized, and a workflow specification may not capture all of the factors involved in a dataset's production. System audit trails capture potentially all processing activities, but not the relationships between them. We describe a system that transparently (i.e., without any modification to science codes) and automatically (i.e. without any human intervention) captures the low-level interactions (files read/written, parameters accessed, etc.) between scientific processes, and then synthesizes these relationships into a provenance graph. This system---the Earth System Science Server (ES3)---is sufficiently general that it can accommodate any combination of stand-alone programs, interpreted codes (e.g. IDL), and command- language scripts. Provenance in ES3 can be published in well-defined XML formats (including formats suitable for graphical visualization), and queried to determine the ancestors or descendants of any specific data file or process invocation. We demonstrate how ES3 can be used to capture the provenance of a large operational ocean color dataset.

  1. A workflow for large-scale empirical identification of cell wall N-linked glycoproteins of tomato (Solanum lycopersicum) fruit by tandem mass spectrometry

    PubMed Central

    Thannhauser, Theodore W.; Shen, Miaoqing; Sherwood, Robert; Howe, Kevin; Fish, Tara; Yang, Yong; Chen, Wei; Zhang, Sheng

    2013-01-01

    Glycosylation is a common post-translational modification of plant proteins that impacts a large number of important biological processes. Nevertheless, the impacts of differential site occupancy and the nature of specific glycoforms are obscure. Historically, characterization of glycoproteins has been difficult due to the distinct physicochemical properties of the peptidyl and glycan moieties, the variable and dynamic nature of the glycosylation process, their heterogeneous nature, and the low relative abundance of each glycoform. In this study, we explore a new pipeline developed for large-scale empirical identification of N-linked glycoproteins of tomato fruit as part of our ongoing efforts to characterize the tomato secretome. The workflow presented involves a combination of lectin affinity, tryptic digestion, ion-pairing HILIC and precursor ion-driven data dependent MS/MS analysis with a script to facilitate the identification and characterization of occupied N-linked glycosylation sites. A total of 212 glycoproteins were identified in this study, in which 26 glycopeptides from 24 glycoproteins were successfully characterized in just one HILIC fraction. Further precursor ion discovery (PID)-based MS/MS and deglycosylation followed by high accuracy and resolution MS analysis were used to confirm the glycosylation sites and determine site occupancy rates. The workflow reported is robust and capable of producing large amounts of empirical data involving N-linked glycosylation sites and their associated glycoforms. PMID:23580464

  2. Assessing color reproduction tolerances in commercial print workflow

    NASA Astrophysics Data System (ADS)

    Beretta, Giordano B.; Hoarau, Eric; Kothari, Sunil; Lin, I.-Jong; Zeng, Jun

    2012-01-01

    Except for linear devices like CRTs, color transformations from colorimetric specifications to device coordinates are mostly obtained by measuring a set of samples, inverting the table, and looking up values in the table (including interpolation), and mapping the gamut from input to output device. The accuracy of a transformation is determined by reproducing a second set of samples and measuring the reproduction errors. Accuracy as the average predicted perceptual error is then used as a metric for quality. Accuracy and precision are important metrics in commercial print because a print service provider can charge a higher price for more accurate color, or can widen his tolerances when customers prefer cheap prints. The disadvantage of determining tolerances through averaging perceptual errors is that the colors in the sample sets are independent and this is not necessarily a good correlate of print quality as determined through psychophysics studies. Indeed, images consist of color palettes and the main quality factor is not color fidelity but color integrity. For example, if the divergence of the field of error vectors is zero, color constancy is likely to take over and humans will perceive the color reproduction as being of good quality, even if the average error is relatively large. However, if the errors are small but in random directions, the perceived image quality is poor because the relation among colors is altered. We propose a standard practice to determine tolerance based on the Farnsworth-Munsell 100-hue test (FM-100) for the second set and to evaluate the color transpositions-a metric for color integrity-instead of the color differences. The quality metric is then the FM-100 score. There are industry standards for the tolerances of color judges, and the same tolerances and classification can be use for print workflows or its components (e.g., presses, proofers, displays). We generalize this practice to arbitrary perceptually uniform scales tailored to

  3. Automating the Photogrammetric Workflow in a National Mapping Agency

    NASA Astrophysics Data System (ADS)

    Holland, D.; Gladstone, C.; Sargent, I.; Horgan, J.; Gardiner, A.; Freeman, M.

    2012-07-01

    The goal of automating the process of identifying changes to topographic features in aerial photography, extracting the geometry of these features and recording the changes in a database, is yet to be fully realised. At Ordnance Survey, Britain's national mapping agency, research into the automation of these processes has been underway for several years, and is now beginning to be implemented in production systems. At the start of the processing chain is the identification of change - new buildings and roads being constructed, old structures demolished, alterations to field and vegetation boundaries and changes to inland water features. Using eCognition object-based image analysis techniques, a system has been developed to detect the changes to features. This uses four-band digital imagery (red, green, blue and near infra-red), together with a digital surface model derived by image matching, to identify all the topographic features of interest to a mapping agency. Once identified, these features are compared with those in the National Geographic Database and any potential changes are highlighted. These changes will be presented to photogrammetrists in the production area, who will rapidly assess whether or not the changes are real. If the change is accepted, they will manually capture the geometry and attributes of the feature concerned. The change detection process, although not fully automatic, cuts down the amount of time required to update the database, enabling a more efficient data collection workflow. Initial results, on the detection of changes to buildings only, showed a completeness value (proportion of the real changes that were found) of 92% and a correctness value (proportion of the changes found that were real changes) of 22%, with a time saving of around 50% when compared with the equivalent manual process. The completeness value is similar to those obtained by the manual process. Further work on the process has added vegetation, water and many other

  4. Potential of knowledge discovery using workflows implemented in the C3Grid

    NASA Astrophysics Data System (ADS)

    Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny

    2013-04-01

    With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity

  5. Robust, Optimal Subsonic Airfoil Shapes

    NASA Technical Reports Server (NTRS)

    Rai, Man Mohan

    2014-01-01

    A method has been developed to create an airfoil robust enough to operate satisfactorily in different environments. This method determines a robust, optimal, subsonic airfoil shape, beginning with an arbitrary initial airfoil shape, and imposes the necessary constraints on the design. Also, this method is flexible and extendible to a larger class of requirements and changes in constraints imposed.

  6. Robust Understanding of Statistical Variation

    ERIC Educational Resources Information Center

    Peters, Susan A.

    2011-01-01

    This paper presents a framework that captures the complexity of reasoning about variation in ways that are indicative of robust understanding and describes reasoning as a blend of design, data-centric, and modeling perspectives. Robust understanding is indicated by integrated reasoning about variation within each perspective and across…

  7. Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    PubMed Central

    Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

    2011-01-01

    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495

  8. Workflow interruptions, social stressors from supervisor(s) and attention failure in surgery personnel

    PubMed Central

    PEREIRA, Diana; MÜLLER, Patrick; ELFERING, Achim

    2015-01-01

    Workflow interruptions and social stressors among surgery personnel may cause attention failure at work that may increase rumination about work issues during leisure time. The test of these assumptions should contribute to the understanding of exhaustion in surgery personnel and patient safety. Workflow interruptions and supervisor-related social stressors were tested to predict attention failure that predicts work-related rumination during leisure time. One hundred ninety-four theatre nurses, anaesthetists and surgeons from a Swiss University hospital participated in a cross-sectional survey. The participation rate was 58%. Structural equation modelling confirmed both indirect paths from workflow interruptions and social stressors via attention failure on rumination (both p<0.05). An alternative model, assuming the reversed indirect causation—from attention failure via workflow interruptions and social stressors on rumination—could not be empirically supported. Workflow interruptions and social stressors at work are likely to trigger attention failure in surgery personnel. Work redesign and team intervention could help surgery personnel to maintain a high level of quality and patient safety and detach from work related issues to recover during leisure time. PMID:26027706

  9. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  10. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

  11. Facial symmetry in robust anthropometrics.

    PubMed

    Kalina, Jan

    2012-05-01

    Image analysis methods commonly used in forensic anthropology do not have desirable robustness properties, which can be ensured by robust statistical methods. In this paper, the face localization in images is carried out by detecting symmetric areas in the images. Symmetry is measured between two neighboring rectangular areas in the images using a new robust correlation coefficient, which down-weights regions in the face violating the symmetry. Raw images of faces without usual preliminary transformations are considered. The robust correlation coefficient based on the least weighted squares regression yields very promising results also in the localization of such faces, which are not entirely symmetric. Standard methods of statistical machine learning are applied for comparison. The robust correlation analysis can be applicable to other problems of forensic anthropology.

  12. A Robust Biomarker

    NASA Technical Reports Server (NTRS)

    Westall, F.; Steele, A.; Toporski, J.; Walsh, M. M.; Allen, C. C.; Guidry, S.; McKay, D. S.; Gibson, E. K.; Chafetz, H. S.

    2000-01-01

    containing fossil biofilm, including the 3.5 b.y..-old carbonaceous cherts from South Africa and Australia. As a result of the unique compositional, structural and "mineralisable" properties of bacterial polymer and biofilms, we conclude that bacterial polymers and biofilms constitute a robust and reliable biomarker for life on Earth and could be a potential biomarker for extraterrestrial life.

  13. Defining a sample preparation workflow for advanced virus detection and understanding sensitivity by next-generation sequencing.

    PubMed

    Wang, Christopher J; Feng, Szi Fei; Duncan, Paul

    2014-01-01

    The application of next-generation sequencing (also known as deep sequencing or massively parallel sequencing) for adventitious agent detection is an evolving field that is steadily gaining acceptance in the biopharmaceutical industry. In order for this technology to be successfully applied, a robust method that can isolate viral nucleic acids from a variety of biological samples (such as host cell substrates, cell-free culture fluids, viral vaccine harvests, and animal-derived raw materials) must be established by demonstrating recovery of model virus spikes. In this report, we implement the sample preparation workflow developed by Feng et. al. and assess the sensitivity of virus detection in a next-generation sequencing readout using the Illumina MiSeq platform. We describe a theoretical model to estimate the detection of a target virus in a cell lysate or viral vaccine harvest sample. We show that nuclease treatment can be used for samples that contain a high background of non-relevant nucleic acids (e.g., host cell DNA) in order to effectively increase the sensitivity of sequencing target viruses and reduce the complexity of data analysis. Finally, we demonstrate that at defined spike levels, nucleic acids from a panel of model viruses spiked into representative cell lysate and viral vaccine harvest samples can be confidently recovered by next-generation sequencing.

  14. A new stochastic inversion workflow for time-lapse data: hybrid starting model and double-difference inversion

    NASA Astrophysics Data System (ADS)

    Tao, Yi; Sen, Mrinal K.; Zhang, Rui; Spikes, Kyle T.

    2013-06-01

    Non-uniqueness presents challenges to seismic inverse problems, especially for time-lapse inversion where multiple inversions are needed for different vintages of seismic data. For time-lapse applications, the focus typically is to detect relatively small changes in seismic attributes at limited locations and to relate these differences to changes in the underlying physical properties. We propose a robust inversion workflow where the baseline inversion uses a starting model, which combines a high-frequency fractal component and a low-frequency component from well log data. This starting model provides an estimate of the null space based on fractal statistics of well data. To further focus on the localized changes, the inverted elastic parameters from the baseline model and the difference between two time-lapse data are summed together to produce the virtual time-lapse seismic data. This is known as double-difference inversion, which focuses primarily on the areas where time-lapse changes occur. The misfit function uses both data and model norms so that the ill-posedness of the inverse problem can be regularized. We pre-process the seismic data using a local correlation-based warping algorithm to register the time-lapse datasets. Finally, very fast simulated annealing, a nonlinear global search method, is used to minimize the misfit function. We demonstrate the effectiveness of our method with synthetic data and field data from Cranfield site used for CO2 sequestration studies.

  15. A 'waterfall' transfer-based workflow for improved quality of tissue microarray construction and processing in breast cancer research.

    PubMed

    Oberländer, M; Alkemade, H; Bünger, S; Ernst, F; Thorns, C; Braunschweig, T; Habermann, J K

    2014-07-01

    A major focus in cancer research is the identification of biomarkers for early diagnosis, therapy prediction and prognosis. Hereby, validation of target proteins on clinical samples is of high importance. Tissue microarrays (TMAs) represent an essential advancement for high-throughput analysis by assembling large numbers of tissue cores with high efficacy and comparability. However, limitations along TMA construction and processing exist. In our presented study, we had to overcome several obstacles in the construction and processing of high-density breast cancer TMAs to ensure good quality sections for further research. Exemplarily, 406 breast tissue cores from formalin-fixed and paraffin embedded samples of 245 patients were placed onto three recipient paraffin blocks. Sectioning was performed using a rotary microtome with a "waterfall" automated transfer system. Sections were stained by immunohistochemistry and immunofluorescence for nine proteins. The number and quality of cores after sectioning and staining was counted manually for each marker. In total, 97.1 % of all cores were available after sectioning, while further 96 % of the remaining cores were evaluable after staining. Thereby, normal tissue cores were more often lost compared to tumor tissue cores. Our workflow provides a robust method for manufacturing high-density breast cancer TMAs for subsequent IHC or IF staining without significant sample loss. PMID:24619867

  16. A 'waterfall' transfer-based workflow for improved quality of tissue microarray construction and processing in breast cancer research.

    PubMed

    Oberländer, M; Alkemade, H; Bünger, S; Ernst, F; Thorns, C; Braunschweig, T; Habermann, J K

    2014-07-01

    A major focus in cancer research is the identification of biomarkers for early diagnosis, therapy prediction and prognosis. Hereby, validation of target proteins on clinical samples is of high importance. Tissue microarrays (TMAs) represent an essential advancement for high-throughput analysis by assembling large numbers of tissue cores with high efficacy and comparability. However, limitations along TMA construction and processing exist. In our presented study, we had to overcome several obstacles in the construction and processing of high-density breast cancer TMAs to ensure good quality sections for further research. Exemplarily, 406 breast tissue cores from formalin-fixed and paraffin embedded samples of 245 patients were placed onto three recipient paraffin blocks. Sectioning was performed using a rotary microtome with a "waterfall" automated transfer system. Sections were stained by immunohistochemistry and immunofluorescence for nine proteins. The number and quality of cores after sectioning and staining was counted manually for each marker. In total, 97.1 % of all cores were available after sectioning, while further 96 % of the remaining cores were evaluable after staining. Thereby, normal tissue cores were more often lost compared to tumor tissue cores. Our workflow provides a robust method for manufacturing high-density breast cancer TMAs for subsequent IHC or IF staining without significant sample loss.

  17. An Integrated Workflow For Secondary Use of Patient Data for Clinical Research.

    PubMed

    Bouzillé, Guillaume; Sylvestre, Emmanuelle; Campillo-Gimenez, Boris; Renault, Eric; Ledieu, Thibault; Delamarre, Denis; Cuggia, Marc

    2015-01-01

    This work proposes an integrated workflow for secondary use of medical data to serve feasibility studies, and the prescreening and monitoring of research studies. All research issues are initially addressed by the Clinical Research Office through a research portal and subsequently redirected to relevant experts in the determined field of concentration. For secondary use of data, the workflow is then based on the clinical data warehouse of the hospital. A datamart with potentially eligible research candidates is constructed. Datamarts can either produce aggregated data, de-identified data, or identified data, according to the kind of study being treated. In conclusion, integrating the secondary use of data process into a general research workflow allows visibility of information technologies and improves the accessability of clinical data.

  18. An Integrated Workflow For Secondary Use of Patient Data for Clinical Research.

    PubMed

    Bouzillé, Guillaume; Sylvestre, Emmanuelle; Campillo-Gimenez, Boris; Renault, Eric; Ledieu, Thibault; Delamarre, Denis; Cuggia, Marc

    2015-01-01

    This work proposes an integrated workflow for secondary use of medical data to serve feasibility studies, and the prescreening and monitoring of research studies. All research issues are initially addressed by the Clinical Research Office through a research portal and subsequently redirected to relevant experts in the determined field of concentration. For secondary use of data, the workflow is then based on the clinical data warehouse of the hospital. A datamart with potentially eligible research candidates is constructed. Datamarts can either produce aggregated data, de-identified data, or identified data, according to the kind of study being treated. In conclusion, integrating the secondary use of data process into a general research workflow allows visibility of information technologies and improves the accessability of clinical data. PMID:26262215

  19. New strategies for medical data mining, part 3: automated workflow analysis and optimization.

    PubMed

    Reiner, Bruce

    2011-02-01

    The practice of evidence-based medicine calls for the creation of "best practice" guidelines, leading to improved clinical outcomes. One of the primary factors limiting evidence-based medicine in radiology today is the relative paucity of standardized databases. The creation of standardized medical imaging databases offer the potential to enhance radiologist workflow and diagnostic accuracy through objective data-driven analytics, which can be categorized in accordance with specific variables relating to the individual examination, patient, provider, and technology being used. In addition to this "global" database analysis, "individual" radiologist workflow can be analyzed through the integration of electronic auditing tools into the PACS. The combination of these individual and global analyses can ultimately identify best practice patterns, which can be adapted to the individual attributes of end users and ultimately used in the creation of automated evidence-based medicine workflow templates.

  20. Parallel workflow tools to facilitate human brain MRI post-processing

    PubMed Central

    Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

    2015-01-01

    Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043

  1. Integrating advanced materials simulation techniques into an automated data analysis workflow at the Spallation Neutron Source

    SciTech Connect

    Borreguero Calvo, Jose M; Campbell, Stuart I; Delaire, Olivier A; Doucet, Mathieu; Goswami, Monojoy; Hagen, Mark E; Lynch, Vickie E; Proffen, Thomas E; Ren, Shelly; Savici, Andrei T; Sumpter, Bobby G

    2014-01-01

    This presentation will review developments on the integration of advanced modeling and simulation techniques into the analysis step of experimental data obtained at the Spallation Neutron Source. A workflow framework for the purpose of refining molecular mechanics force-fields against quasi-elastic neutron scattering data is presented. The workflow combines software components to submit model simulations to remote high performance computers, a message broker interface for communications between the optimizer engine and the simulation production step, and tools to convolve the simulated data with the experimental resolution. A test application shows the correction to a popular fixed-charge water model in order to account polarization effects due to the presence of solvated ions. Future enhancements to the refinement workflow are discussed. This work is funded through the DOE Center for Accelerating Materials Modeling.

  2. Parametric Workflow (BIM) for the Repair Construction of Traditional Historic Architecture in Taiwan

    NASA Astrophysics Data System (ADS)

    Ma, Y.-P.; Hsu, C. C.; Lin, M.-C.; Tsai, Z.-W.; Chen, J.-Y.

    2015-08-01

    In Taiwan, numerous existing traditional buildings are constructed with wooden structures, brick structures, and stone structures. This paper will focus on the Taiwan traditional historic architecture and target the traditional wooden structure buildings as the design proposition and process the BIM workflow for modeling complex wooden combination geometry, integrating with more traditional 2D documents and for visualizing repair construction assumptions within the 3D model representation. The goal of this article is to explore the current problems to overcome in wooden historic building conservation, and introduce the BIM technology in the case of conserving, documenting, managing, and creating full engineering drawings and information for effectively support historic conservation. Although BIM is mostly oriented to current construction praxis, there have been some attempts to investigate its applicability in historic conservation projects. This article also illustrates the importance and advantages of using BIM workflow in repair construction process, when comparing with generic workflow.

  3. RSRE: RNA structural robustness evaluator.

    PubMed

    Shu, Wenjie; Bo, Xiaochen; Zheng, Zhiqiang; Wang, Shengqi

    2007-07-01

    Biological robustness, defined as the ability to maintain stable functioning in the face of various perturbations, is an important and fundamental topic in current biology, and has become a focus of numerous studies in recent years. Although structural robustness has been explored in several types of RNA molecules, the origins of robustness are still controversial. Computational analysis results are needed to make up for the lack of evidence of robustness in natural biological systems. The RNA structural robustness evaluator (RSRE) web server presented here provides a freely available online tool to quantitatively evaluate the structural robustness of RNA based on the widely accepted definition of neutrality. Several classical structure comparison methods are employed; five randomization methods are implemented to generate control sequences; sub-optimal predicted structures can be optionally utilized to mitigate the uncertainty of secondary structure prediction. With a user-friendly interface, the web application is easy to use. Intuitive illustrations are provided along with the original computational results to facilitate analysis. The RSRE will be helpful in the wide exploration of RNA structural robustness and will catalyze our understanding of RNA evolution. The RSRE web server is freely available at http://biosrv1.bmi.ac.cn/RSRE/ or http://biotech.bmi.ac.cn/RSRE/.

  4. Modeling workflow to design machine translation applications for public health practice

    PubMed Central

    Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin

    2014-01-01

    Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922

  5. Wireless remote control clinical image workflow: utilizing a PDA for offsite distribution

    NASA Astrophysics Data System (ADS)

    Liu, Brent J.; Documet, Luis; Documet, Jorge; Huang, H. K.; Muldoon, Jean

    2004-04-01

    Last year we presented in RSNA an application to perform wireless remote control of PACS image distribution utilizing a handheld device such as a Personal Digital Assistant (PDA). This paper describes the clinical experiences including workflow scenarios of implementing the PDA application to route exams from the clinical PACS archive server to various locations for offsite distribution of clinical PACS exams. By utilizing this remote control application, radiologists can manage image workflow distribution with a single wireless handheld device without impacting their clinical workflow on diagnostic PACS workstations. A PDA application was designed and developed to perform DICOM Query and C-Move requests by a physician from a clinical PACS Archive to a CD-burning device for automatic burning of PACS data for the distribution to offsite. In addition, it was also used for convenient routing of historical PACS exams to the local web server, local workstations, and teleradiology systems. The application was evaluated by radiologists as well as other clinical staff who need to distribute PACS exams to offsite referring physician"s offices and offsite radiologists. An application for image workflow management utilizing wireless technology was implemented in a clinical environment and evaluated. A PDA application was successfully utilized to perform DICOM Query and C-Move requests from the clinical PACS archive to various offsite exam distribution devices. Clinical staff can utilize the PDA to manage image workflow and PACS exam distribution conveniently for offsite consultations by referring physicians and radiologists. This solution allows the radiologist to expand their effectiveness in health care delivery both within the radiology department as well as offisite by improving their clinical workflow.

  6. SU-E-T-419: Workflow and FMEA in a New Proton Therapy (PT) Facility

    SciTech Connect

    Cheng, C; Wessels, B; Hamilton, H; Difranco, T; Mansur, D

    2014-06-01

    Purpose: Workflow is an important component in the operational planning of a new proton facility. By integrating the concept of failure mode and effect analysis (FMEA) and traditional QA requirements, a workflow for a proton therapy treatment course is set up. This workflow serves as the blue print for the planning of computer hardware/software requirements and network flow. A slight modification of the workflow generates a process map(PM) for FMEA and the planning of QA program in PT. Methods: A flowchart is first developed outlining the sequence of processes involved in a PT treatment course. Each process consists of a number of sub-processes to encompass a broad scope of treatment and QA procedures. For each subprocess, the personnel involved, the equipment needed and the computer hardware/software as well as network requirements are defined by a team of clinical staff, administrators and IT personnel. Results: Eleven intermediate processes with a total of 70 sub-processes involved in a PT treatment course are identified. The number of sub-processes varies, ranging from 2-12. The sub-processes within each process are used for the operational planning. For example, in the CT-Sim process, there are 12 sub-processes: three involve data entry/retrieval from a record-and-verify system, two controlled by the CT computer, two require department/hospital network, and the other five are setup procedures. IT then decides the number of computers needed and the software and network requirement. By removing the traditional QA procedures from the workflow, a PM is generated for FMEA analysis to design a QA program for PT. Conclusion: Significant efforts are involved in the development of the workflow in a PT treatment course. Our hybrid model of combining FMEA and traditional QA program serves a duo purpose of efficient operational planning and designing of a QA program in PT.

  7. Pervasive robustness in biological systems.

    PubMed

    Félix, Marie-Anne; Barkoulas, Michalis

    2015-08-01

    Robustness is characterized by the invariant expression of a phenotype in the face of a genetic and/or environmental perturbation. Although phenotypic variance is a central measure in the mapping of the genotype and environment to the phenotype in quantitative evolutionary genetics, robustness is also a key feature in systems biology, resulting from nonlinearities in quantitative relationships between upstream and downstream components. In this Review, we provide a synthesis of these two lines of investigation, converging on understanding how variation propagates across biological systems. We critically assess the recent proliferation of studies identifying robustness-conferring genes in the context of the nonlinearity in biological systems. PMID:26184598

  8. Population genetics of translational robustness.

    PubMed

    Wilke, Claus O; Drummond, D Allan

    2006-05-01

    Recent work has shown that expression level is the main predictor of a gene's evolutionary rate and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins that fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highly expressed genes are extremely tolerant to missense substitutions but nevertheless evolve very slowly. Here, we study a simple theoretical model of translational robustness that allows us to gain analytic insight into how this paradoxical behavior arises.

  9. Robustness of airline route networks

    NASA Astrophysics Data System (ADS)

    Lordan, Oriol; Sallan, Jose M.; Escorihuela, Nuria; Gonzalez-Prieto, David

    2016-03-01

    Airlines shape their route network by defining their routes through supply and demand considerations, paying little attention to network performance indicators, such as network robustness. However, the collapse of an airline network can produce high financial costs for the airline and all its geographical area of influence. The aim of this study is to analyze the topology and robustness of the network route of airlines following Low Cost Carriers (LCCs) and Full Service Carriers (FSCs) business models. Results show that FSC hubs are more central than LCC bases in their route network. As a result, LCC route networks are more robust than FSC networks.

  10. Fabrication of Zirconia-Reinforced Lithium Silicate Ceramic Restorations Using a Complete Digital Workflow

    PubMed Central

    Rinke, Sven; Rödiger, Matthias; Ziebolz, Dirk; Schmidt, Anne-Kathrin

    2015-01-01

    This case report describes the fabrication of monolithic all-ceramic restorations using zirconia-reinforced lithium silicate (ZLS) ceramics. The use of powder-free intraoral scanner, generative fabrication technology of the working model, and CAD/CAM of the restorations in the dental laboratory allows a completely digitized workflow. The newly introduced ZLS ceramics offer a unique combination of fracture strength (>420 MPa), excellent optical properties, and optimum polishing characteristics, thus making them an interesting material option for monolithic restorations in the digital workflow. PMID:26509088

  11. Fabrication of Zirconia-Reinforced Lithium Silicate Ceramic Restorations Using a Complete Digital Workflow.

    PubMed

    Rinke, Sven; Rödiger, Matthias; Ziebolz, Dirk; Schmidt, Anne-Kathrin

    2015-01-01

    This case report describes the fabrication of monolithic all-ceramic restorations using zirconia-reinforced lithium silicate (ZLS) ceramics. The use of powder-free intraoral scanner, generative fabrication technology of the working model, and CAD/CAM of the restorations in the dental laboratory allows a completely digitized workflow. The newly introduced ZLS ceramics offer a unique combination of fracture strength (>420 MPa), excellent optical properties, and optimum polishing characteristics, thus making them an interesting material option for monolithic restorations in the digital workflow. PMID:26509088

  12. How to Take HRMS Process Management to the Next Level with Workflow Business Event System

    NASA Technical Reports Server (NTRS)

    Rajeshuni, Sarala; Yagubian, Aram; Kunamaneni, Krishna

    2006-01-01

    Oracle Workflow with the Business Event System offers a complete process management solution for enterprises to manage business processes cost-effectively. Using Workflow event messaging, event subscriptions, AQ Servlet and advanced queuing technologies, this presentation will demonstrate the step-by-step design and implementation of system solutions in order to integrate two dissimilar systems and establish communication remotely. As a case study, the presentation walks you through the process of propagating organization name changes in other applications that originated from the HRMS module without changing applications code. The solution can be applied to your particular business cases for streamlining or modifying business processes across Oracle and non-Oracle applications.

  13. A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control.

    PubMed

    Mathew, Cherian; Güntsch, Anton; Obst, Matthias; Vicario, Saverio; Haines, Robert; Williams, Alan R; de Jong, Yde; Goble, Carole

    2014-01-01

    The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users.

  14. Implementation of workflow engine technology to deliver basic clinical decision support functionality

    PubMed Central

    2011-01-01

    Background Workflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic. Results We present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture. Conclusions We describe an implementation of

  15. Lessons from implementing a combined workflow-informatics system for diabetes management.

    PubMed

    Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C

    2008-01-01

    Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.

  16. Robust Optimization of Biological Protocols

    PubMed Central

    Flaherty, Patrick; Davis, Ronald W.

    2015-01-01

    When conducting high-throughput biological experiments, it is often necessary to develop a protocol that is both inexpensive and robust. Standard approaches are either not cost-effective or arrive at an optimized protocol that is sensitive to experimental variations. We show here a novel approach that directly minimizes the cost of the protocol while ensuring the protocol is robust to experimental variation. Our approach uses a risk-averse conditional value-at-risk criterion in a robust parameter design framework. We demonstrate this approach on a polymerase chain reaction protocol and show that our improved protocol is less expensive than the standard protocol and more robust than a protocol optimized without consideration of experimental variation. PMID:26417115

  17. Robust controls with structured perturbations

    NASA Technical Reports Server (NTRS)

    Keel, Leehyun

    1993-01-01

    This final report summarizes the recent results obtained by the principal investigator and his coworkers on the robust stability and control of systems containing parametric uncertainty. The starting point is a generalization of Kharitonov's theorem obtained in 1989, and its generalization to the multilinear case, the singling out of extremal stability subsets, and other ramifications now constitutes an extensive and coherent theory of robust parametric stability that is summarized in the results contained here.

  18. Robust hashing for 3D models

    NASA Astrophysics Data System (ADS)

    Berchtold, Waldemar; Schäfer, Marcel; Rettig, Michael; Steinebach, Martin

    2014-02-01

    3D models and applications are of utmost interest in both science and industry. With the increment of their usage, their number and thereby the challenge to correctly identify them increases. Content identification is commonly done by cryptographic hashes. However, they fail as a solution in application scenarios such as computer aided design (CAD), scientific visualization or video games, because even the smallest alteration of the 3D model, e.g. conversion or compression operations, massively changes the cryptographic hash as well. Therefore, this work presents a robust hashing algorithm for 3D mesh data. The algorithm applies several different bit extraction methods. They are built to resist desired alterations of the model as well as malicious attacks intending to prevent correct allocation. The different bit extraction methods are tested against each other and, as far as possible, the hashing algorithm is compared to the state of the art. The parameters tested are robustness, security and runtime performance as well as False Acceptance Rate (FAR) and False Rejection Rate (FRR), also the probability calculation of hash collision is included. The introduced hashing algorithm is kept adaptive e.g. in hash length, to serve as a proper tool for all applications in practice.

  19. Designing robust control laws using genetic algorithms

    NASA Technical Reports Server (NTRS)

    Marrison, Chris

    1994-01-01

    The purpose of this research is to create a method of finding practical, robust control laws. The robustness of a controller is judged by Stochastic Robustness metrics and the level of robustness is optimized by searching for design parameters that minimize a robustness cost function.

  20. SALTON SEA SCIENTIFIC DRILLING PROJECT: SCIENTIFIC PROGRAM.

    USGS Publications Warehouse

    Sass, J.H.; Elders, W.A.

    1986-01-01

    The Salton Sea Scientific Drilling Project, was spudded on 24 October 1985, and reached a total depth of 10,564 ft. (3. 2 km) on 17 March 1986. There followed a period of logging, a flow test, and downhole scientific measurements. The scientific goals were integrated smoothly with the engineering and economic objectives of the program and the ideal of 'science driving the drill' in continental scientific drilling projects was achieved in large measure. The principal scientific goals of the project were to study the physical and chemical processes involved in an active, magmatically driven hydrothermal system. To facilitate these studies, high priority was attached to four areas of sample and data collection, namely: (1) core and cuttings, (2) formation fluids, (3) geophysical logging, and (4) downhole physical measurements, particularly temperatures and pressures.

  1. How robust is a robust policy? A comparative analysis of alternative robustness metrics for supporting robust decision analysis.

    NASA Astrophysics Data System (ADS)

    Kwakkel, Jan; Haasnoot, Marjolijn

    2015-04-01

    In response to climate and socio-economic change, in various policy domains there is increasingly a call for robust plans or policies. That is, plans or policies that performs well in a very large range of plausible futures. In the literature, a wide range of alternative robustness metrics can be found. The relative merit of these alternative conceptualizations of robustness has, however, received less attention. Evidently, different robustness metrics can result in different plans or policies being adopted. This paper investigates the consequences of several robustness metrics on decision making, illustrated here by the design of a flood risk management plan. A fictitious case, inspired by a river reach in the Netherlands is used. The performance of this system in terms of casualties, damages, and costs for flood and damage mitigation actions is explored using a time horizon of 100 years, and accounting for uncertainties pertaining to climate change and land use change. A set of candidate policy options is specified up front. This set of options includes dike raising, dike strengthening, creating more space for the river, and flood proof building and evacuation options. The overarching aim is to design an effective flood risk mitigation strategy that is designed from the outset to be adapted over time in response to how the future actually unfolds. To this end, the plan will be based on the dynamic adaptive policy pathway approach (Haasnoot, Kwakkel et al. 2013) being used in the Dutch Delta Program. The policy problem is formulated as a multi-objective robust optimization problem (Kwakkel, Haasnoot et al. 2014). We solve the multi-objective robust optimization problem using several alternative robustness metrics, including both satisficing robustness metrics and regret based robustness metrics. Satisficing robustness metrics focus on the performance of candidate plans across a large ensemble of plausible futures. Regret based robustness metrics compare the

  2. Enabling a Scientific Cloud Marketplace: VGL (Invited)

    NASA Astrophysics Data System (ADS)

    Fraser, R.; Woodcock, R.; Wyborn, L. A.; Vote, J.; Rankine, T.; Cox, S. J.

    2013-12-01

    The Virtual Geophysics Laboratory (VGL) provides a flexible, web based environment where researchers can browse data and use a variety of scientific software packaged into tool kits that run in the Cloud. Both data and tool kits are published by multiple researchers and registered with the VGL infrastructure forming a data and application marketplace. The VGL provides the basic work flow of Discovery and Access to the disparate data sources and a Library for tool kits and scripting to drive the scientific codes. Computation is then performed on the Research or Commercial Clouds. Provenance information is collected throughout the work flow and can be published alongside the results allowing for experiment comparison and sharing with other researchers. VGL's "mix and match" approach to data, computational resources and scientific codes, enables a dynamic approach to scientific collaboration. VGL allows scientists to publish their specific contribution, be it data, code, compute or work flow, knowing the VGL framework will provide other components needed for a complete application. Other scientists can choose the pieces that suit them best to assemble an experiment. The coarse grain workflow of the VGL framework combined with the flexibility of the scripting library and computational toolkits allows for significant customisation and sharing amongst the community. The VGL utilises the cloud computational and storage resources from the Australian academic research cloud provided by the NeCTAR initiative and a large variety of data accessible from national and state agencies via the Spatial Information Services Stack (SISS - http://siss.auscope.org). VGL v1.2 screenshot - http://vgl.auscope.org

  3. Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow.

    PubMed

    Brunk, Elizabeth; George, Kevin W; Alonso-Gutierrez, Jorge; Thompson, Mitchell; Baidoo, Edward; Wang, George; Petzold, Christopher J; McCloskey, Douglas; Monk, Jonathan; Yang, Laurence; O'Brien, Edward J; Batth, Tanveer S; Martin, Hector Garcia; Feist, Adam; Adams, Paul D; Keasling, Jay D; Palsson, Bernhard O; Lee, Taek Soon

    2016-05-25

    Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proof of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks. PMID:27211860

  4. High throughput workflow for coacervate formation and characterization in shampoo systems.

    PubMed

    Kalantar, T H; Tucker, C J; Zalusky, A S; Boomgaard, T A; Wilson, B E; Ladika, M; Jordan, S L; Li, W K; Zhang, X; Goh, C G

    2007-01-01

    Cationic cellulosic polymers find wide utility as benefit agents in shampoo. Deposition of these polymers onto hair has been shown to mend split-ends, improve appearance and wet combing, as well as provide controlled delivery of insoluble actives. The deposition is thought to be enhanced by the formation of a polymer/surfactant complex that phase-separates from the bulk solution upon dilution. A standard characterization method has been developed to characterize the coacervate formation upon dilution, but the test is time and material prohibitive. We have developed a semi-automated high throughput workflow to characterize the coacervate-forming behavior of different shampoo formulations. A procedure that allows testing of real use shampoo dilutions without first formulating a complete shampoo was identified. This procedure was adapted to a Tecan liquid handler by optimizing the parameters for liquid dispensing as well as for mixing. The high throughput workflow enabled preparation and testing of hundreds of formulations with different types and levels of cationic cellulosic polymers and surfactants, and for each formulation a haze diagram was constructed. Optimal formulations and their dilutions that give substantial coacervate formation (determined by haze measurements) were identified. Results from this high throughput workflow were shown to reproduce standard haze and bench-top turbidity measurements, and this workflow has the advantages of using less material and allowing more variables to be tested with significant time savings.

  5. Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow.

    PubMed

    Brunk, Elizabeth; George, Kevin W; Alonso-Gutierrez, Jorge; Thompson, Mitchell; Baidoo, Edward; Wang, George; Petzold, Christopher J; McCloskey, Douglas; Monk, Jonathan; Yang, Laurence; O'Brien, Edward J; Batth, Tanveer S; Martin, Hector Garcia; Feist, Adam; Adams, Paul D; Keasling, Jay D; Palsson, Bernhard O; Lee, Taek Soon

    2016-05-25

    Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proof of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.

  6. A data-independent acquisition workflow for qualitative screening of new psychoactive substances in biological samples.

    PubMed

    Kinyua, Juliet; Negreira, Noelia; Ibáñez, María; Bijlsma, Lubertus; Hernández, Félix; Covaci, Adrian; van Nuijs, Alexander L N

    2015-11-01

    Identification of new psychoactive substances (NPS) is challenging. Developing targeted methods for their analysis can be difficult and costly due to their impermanence on the drug scene. Accurate-mass mass spectrometry (AMMS) using a quadrupole time-of-flight (QTOF) analyzer can be useful for wide-scope screening since it provides sensitive, full-spectrum MS data. Our article presents a qualitative screening workflow based on data-independent acquisition mode (all-ions MS/MS) on liquid chromatography (LC) coupled to QTOFMS for the detection and identification of NPS in biological matrices. The workflow combines and structures fundamentals of target and suspect screening data processing techniques in a structured algorithm. This allows the detection and tentative identification of NPS and their metabolites. We have applied the workflow to two actual case studies involving drug intoxications where we detected and confirmed the parent compounds ketamine, 25B-NBOMe, 25C-NBOMe, and several predicted phase I and II metabolites not previously reported in urine and serum samples. The screening workflow demonstrates the added value for the detection and identification of NPS in biological matrices.

  7. A data-independent acquisition workflow for qualitative screening of new psychoactive substances in biological samples.

    PubMed

    Kinyua, Juliet; Negreira, Noelia; Ibáñez, María; Bijlsma, Lubertus; Hernández, Félix; Covaci, Adrian; van Nuijs, Alexander L N

    2015-11-01

    Identification of new psychoactive substances (NPS) is challenging. Developing targeted methods for their analysis can be difficult and costly due to their impermanence on the drug scene. Accurate-mass mass spectrometry (AMMS) using a quadrupole time-of-flight (QTOF) analyzer can be useful for wide-scope screening since it provides sensitive, full-spectrum MS data. Our article presents a qualitative screening workflow based on data-independent acquisition mode (all-ions MS/MS) on liquid chromatography (LC) coupled to QTOFMS for the detection and identification of NPS in biological matrices. The workflow combines and structures fundamentals of target and suspect screening data processing techniques in a structured algorithm. This allows the detection and tentative identification of NPS and their metabolites. We have applied the workflow to two actual case studies involving drug intoxications where we detected and confirmed the parent compounds ketamine, 25B-NBOMe, 25C-NBOMe, and several predicted phase I and II metabolites not previously reported in urine and serum samples. The screening workflow demonstrates the added value for the detection and identification of NPS in biological matrices. PMID:26396082

  8. Unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in the outpatient pharmacy

    PubMed Central

    Nanji, Karen C; Rothschild, Jeffrey M; Boehne, Jennifer J; Keohane, Carol A; Ash, Joan S; Poon, Eric G

    2014-01-01

    Introduction Electronic prescribing systems have often been promoted as a tool for reducing medication errors and adverse drug events. Recent evidence has revealed that adoption of electronic prescribing systems can lead to unintended consequences such as the introduction of new errors. The purpose of this study is to identify and characterize the unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in an outpatient pharmacy. Methods A multidisciplinary team conducted direct observations of workflow in an independent pharmacy and semi-structured interviews with pharmacy staff members about their perceptions of the unrealized potential and residual consequences of electronic prescribing systems. We used qualitative methods to iteratively analyze text data using a grounded theory approach, and derive a list of major themes and subthemes related to the unrealized potential and residual consequences of electronic prescribing. Results We identified the following five themes: Communication, workflow disruption, cost, technology, and opportunity for new errors. These contained 26 unique subthemes representing different facets of our observations and the pharmacy staff's perceptions of the unrealized potential and residual consequences of electronic prescribing. Discussion We offer targeted solutions to improve electronic prescribing systems by addressing the unrealized potential and residual consequences that we identified. These recommendations may be applied not only to improve staff perceptions of electronic prescribing systems but also to improve the design and/or selection of these systems in order to optimize communication and workflow within pharmacies while minimizing both cost and the potential for the introduction of new errors. PMID:24154836

  9. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories

    ERIC Educational Resources Information Center

    Hanlon, Ann; Ramirez, Marisa

    2011-01-01

    An online survey of institutional repository (IR) managers identified copyright clearance trends in staffing and workflows. The majority of respondents followed a mediated deposit model, and reported that library personnel, instead of authors, engaged in copyright clearance activities for IRs. The most common "information gaps" pertained to the…

  10. From Benchtop to Desktop: Important Considerations when Designing Amplicon Sequencing Workflows

    PubMed Central

    Murray, Dáithí C.; Coghlan, Megan L.; Bunce, Michael

    2015-01-01

    Amplicon sequencing has been the method of choice in many high-throughput DNA sequencing (HTS) applications. To date there has been a heavy focus on the means by which to analyse the burgeoning amount of data afforded by HTS. In contrast, there has been a distinct lack of attention paid to considerations surrounding the importance of sample preparation and the fidelity of library generation. No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps. This paper redresses this imbalance by focusing on aspects pertaining to the benchtop within typical amplicon workflows: sample screening, the target region, and library generation. Empirical data is provided to illustrate the scope of the problem. Lastly, the impact of various data analysis parameters is also investigated in the context of how the data was initially generated. It is hoped this paper may serve to highlight the importance of pre-analysis workflows in achieving meaningful, future-proof data that can be analysed appropriately. As amplicon sequencing gains traction in a variety of diagnostic applications from forensics to environmental DNA (eDNA) it is paramount workflows and analytics are both fit for purpose. PMID:25902146

  11. Interacting with the National Database for Autism Research (NDAR) via the LONI Pipeline workflow environment.

    PubMed

    Torgerson, Carinna M; Quinn, Catherine; Dinov, Ivo; Liu, Zhizhong; Petrosyan, Petros; Pelphrey, Kevin; Haselgrove, Christian; Kennedy, David N; Toga, Arthur W; Van Horn, John Darrell

    2015-03-01

    Under the umbrella of the National Database for Clinical Trials (NDCT) related to mental illnesses, the National Database for Autism Research (NDAR) seeks to gather, curate, and make openly available neuroimaging data from NIH-funded studies of autism spectrum disorder (ASD). NDAR has recently made its database accessible through the LONI Pipeline workflow design and execution environment to enable large-scale analyses of cortical architecture and function via local, cluster, or "cloud"-based computing resources. This presents a unique opportunity to overcome many of the customary limitations to fostering biomedical neuroimaging as a science of discovery. Providing open access to primary neuroimaging data, workflow methods, and high-performance computing will increase uniformity in data collection protocols, encourage greater reliability of published data, results replication, and broaden the range of researchers now able to perform larger studies than ever before. To illustrate the use of NDAR and LONI Pipeline for performing several commonly performed neuroimaging processing steps and analyses, this paper presents example workflows useful for ASD neuroimaging researchers seeking to begin using this valuable combination of online data and computational resources. We discuss the utility of such database and workflow processing interactivity as a motivation for the sharing of additional primary data in ASD research and elsewhere. PMID:25666423

  12. From benchtop to desktop: important considerations when designing amplicon sequencing workflows.

    PubMed

    Murray, Dáithí C; Coghlan, Megan L; Bunce, Michael

    2015-01-01

    Amplicon sequencing has been the method of choice in many high-throughput DNA sequencing (HTS) applications. To date there has been a heavy focus on the means by which to analyse the burgeoning amount of data afforded by HTS. In contrast, there has been a distinct lack of attention paid to considerations surrounding the importance of sample preparation and the fidelity of library generation. No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps. This paper redresses this imbalance by focusing on aspects pertaining to the benchtop within typical amplicon workflows: sample screening, the target region, and library generation. Empirical data is provided to illustrate the scope of the problem. Lastly, the impact of various data analysis parameters is also investigated in the context of how the data was initially generated. It is hoped this paper may serve to highlight the importance of pre-analysis workflows in achieving meaningful, future-proof data that can be analysed appropriately. As amplicon sequencing gains traction in a variety of diagnostic applications from forensics to environmental DNA (eDNA) it is paramoun