workflow composition tool: Topics by Science.gov

Sample records for workflow composition tool

CASAS: A tool for composing automatically and semantically astrophysical services

NASA Astrophysics Data System (ADS)

Louge, T.; Karray, M. H.; Archimède, B.; Knödlseder, J.

2017-07-01

Multiple astronomical datasets are available through internet and the astrophysical Distributed Computing Infrastructure (DCI) called Virtual Observatory (VO). Some scientific workflow technologies exist for retrieving and combining data from those sources. However selection of relevant services, automation of the workflows composition and the lack of user-friendly platforms remain a concern. This paper presents CASAS, a tool for semantic web services composition in astrophysics. This tool proposes automatic composition of astrophysical web services and brings a semantics-based, automatic composition of workflows. It widens the services choice and eases the use of heterogeneous services. Semantic web services composition relies on ontologies for elaborating the services composition; this work is based on Astrophysical Services ONtology (ASON). ASON had its structure mostly inherited from the VO services capacities. Nevertheless, our approach is not limited to the VO and brings VO plus non-VO services together without the need for premade recipes. CASAS is available for use through a simple web interface.
A Tool Supporting Collaborative Data Analytics Workflow Design and Management

NASA Astrophysics Data System (ADS)

Zhang, J.; Bao, Q.; Lee, T. J.

2016-12-01

Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.
Support for Taverna workflows in the VPH-Share cloud platform.

PubMed

Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F

2017-07-01

To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
wft4galaxy: a workflow testing tool for galaxy.

PubMed

Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi

2017-12-01

Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
Kwf-Grid workflow management system for Earth science applications

NASA Astrophysics Data System (ADS)

Tran, V.; Hluchy, L.

2009-04-01

In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
An ontology-based framework for bioinformatics workflows.

PubMed

Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

2007-01-01

The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

PubMed

Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias

2011-03-21

Bioinformatics visualization tools are often not robust enough to support biomedical specialistsâ€™ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialistâ€™s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users â€œbundles of featuresâ€ consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.
Provenance-Powered Automatic Workflow Generation and Composition

NASA Astrophysics Data System (ADS)

Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.

2015-12-01

In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
QoS measurement of workflow-based web service compositions using Colored Petri net.

PubMed

Nematzadeh, Hossein; Motameni, Homayun; Mohamad, Radziah; Nematzadeh, Zahra

2014-01-01

Workflow-based web service compositions (WB-WSCs) is one of the main composition categories in service oriented architecture (SOA). Eflow, polymorphic process model (PPM), and business process execution language (BPEL) are the main techniques of the category of WB-WSCs. Due to maturity of web services, measuring the quality of composite web services being developed by different techniques becomes one of the most important challenges in today's web environments. Business should try to provide good quality regarding the customers' requirements to a composed web service. Thus, quality of service (QoS) which refers to nonfunctional parameters is important to be measured since the quality degree of a certain web service composition could be achieved. This paper tried to find a deterministic analytical method for dependability and performance measurement using Colored Petri net (CPN) with explicit routing constructs and application of theory of probability. A computer tool called WSET was also developed for modeling and supporting QoS measurement through simulation.
High-Throughput Industrial Coatings Research at The Dow Chemical Company.

PubMed

Kuo, Tzu-Chi; Malvadkar, Niranjan A; Drumright, Ray; Cesaretti, Richard; Bishop, Matthew T

2016-09-12

At The Dow Chemical Company, high-throughput research is an active area for developing new industrial coatings products. Using the principles of automation (i.e., using robotic instruments), parallel processing (i.e., prepare, process, and evaluate samples in parallel), and miniaturization (i.e., reduce sample size), high-throughput tools for synthesizing, formulating, and applying coating compositions have been developed at Dow. In addition, high-throughput workflows for measuring various coating properties, such as cure speed, hardness development, scratch resistance, impact toughness, resin compatibility, pot-life, surface defects, among others have also been developed in-house. These workflows correlate well with the traditional coatings tests, but they do not necessarily mimic those tests. The use of such high-throughput workflows in combination with smart experimental designs allows accelerated discovery and commercialization.
Using Kepler for Tool Integration in Microarray Analysis Workflows.

PubMed

Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.
A tutorial of diverse genome analysis tools found in the CoGe web-platform using Plasmodium spp. as a model

PubMed Central

Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric

2018-01-01

Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
A Model of Workflow Composition for Emergency Management

NASA Astrophysics Data System (ADS)

Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu

The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
Improving adherence to the Epic Beacon ambulatory workflow.

PubMed

Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana

2017-06-01

Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.
A standard-enabled workflow for synthetic biology.

PubMed

Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

2017-06-15

A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE PAGES

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...

2016-07-21

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrix, Valerie; Fox, James; Ghoshal, Devarshi

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Business intelligence tools for radiology: creating a prototype model using open-source tools.

PubMed

Prevedello, Luciano M; Andriole, Katherine P; Hanson, Richard; Kelly, Pauline; Khorasani, Ramin

2010-04-01

Digital radiology departments could benefit from the ability to integrate and visualize data (e.g. information reflecting complex workflow states) from all of their imaging and information management systems in one composite presentation view. Leveraging data warehousing tools developed in the business world may be one way to achieve this capability. In total, the concept of managing the information available in this data repository is known as Business Intelligence or BI. This paper describes the concepts used in Business Intelligence, their importance to modern Radiology, and the steps used in the creation of a prototype model of a data warehouse for BI using open-source tools.
Integrated workflows for spiking neuronal network simulations

PubMed Central

Antolík, Ján; Davison, Andrew P.

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Integrated workflows for spiking neuronal network simulations.

PubMed

Antolík, Ján; Davison, Andrew P

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.

Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

2011-07-04

A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less
Parallel workflow tools to facilitate human brain MRI post-processing

PubMed Central

Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

2015-01-01

Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
A practical workflow for making anatomical atlases for biological research.

PubMed

Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles

2012-01-01

The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.
Managing and Communicating Operational Workflow: Designing and Implementing an Electronic Outpatient Whiteboard.

PubMed

Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M

2016-01-01

Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.
Flexible Workflow Software enables the Management of an Increased Volume and Heterogeneity of Sensors, and evolves with the Expansion of Complex Ocean Observatory Infrastructures.

NASA Astrophysics Data System (ADS)

Tomlin, M. C.; Jenkyns, R.

2015-12-01

Ocean Networks Canada (ONC) collects data from observatories in the northeast Pacific, Salish Sea, Arctic Ocean, Atlantic Ocean, and land-based sites in British Columbia. Data are streamed, collected autonomously, or transmitted via satellite from a variety of instruments. The Software Engineering group at ONC develops and maintains Oceans 2.0, an in-house software system that acquires and archives data from sensors, and makes data available to scientists, the public, government and non-government agencies. The Oceans 2.0 workflow tool was developed by ONC to manage a large volume of tasks and processes required for instrument installation, recovery and maintenance activities. Since 2013, the workflow tool has supported 70 expeditions and grown to include 30 different workflow processes for the increasing complexity of infrastructures at ONC. The workflow tool strives to keep pace with an increasing heterogeneity of sensors, connections and environments by supporting versioning of existing workflows, and allowing the creation of new processes and tasks. Despite challenges in training and gaining mutual support from multidisciplinary teams, the workflow tool has become invaluable in project management in an innovative setting. It provides a collective place to contribute to ONC's diverse projects and expeditions and encourages more repeatable processes, while promoting interactions between the multidisciplinary teams who manage various aspects of instrument development and the data they produce. The workflow tool inspires documentation of terminologies and procedures, and effectively links to other tools at ONC such as JIRA, Alfresco and Wiki. Motivated by growing sensor schemes, modes of collecting data, archiving, and data distribution at ONC, the workflow tool ensures that infrastructure is managed completely from instrument purchase to data distribution. It integrates all areas of expertise and helps fulfill ONC's mandate to offer quality data to users.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

PubMed Central

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-01-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

PubMed

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-06-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
Managing and Communicating Operational Workflow

PubMed Central

Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

2016-01-01

Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.

PubMed

Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton

2013-01-01

The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Extension of specification language for soundness and completeness of service workflow

NASA Astrophysics Data System (ADS)

Viriyasitavat, Wattana; Xu, Li Da; Bi, Zhuming; Sapsomboon, Assadaporn

2018-05-01

A Service Workflow is an aggregation of distributed services to fulfill specific functionalities. With ever increasing available services, the methodologies for the selections of the services against the given requirements become main research subjects in multiple disciplines. A few of researchers have contributed to the formal specification languages and the methods for model checking; however, existing methods have the difficulties to tackle with the complexity of workflow compositions. In this paper, we propose to formalize the specification language to reduce the complexity of the workflow composition. To this end, we extend a specification language with the consideration of formal logic, so that some effective theorems can be derived for the verification of syntax, semantics, and inference rules in the workflow composition. The logic-based approach automates compliance checking effectively. The Service Workflow Specification (SWSpec) has been extended and formulated, and the soundness, completeness, and consistency of SWSpec applications have been verified; note that a logic-based SWSpec is mandatory for the development of model checking. The application of the proposed SWSpec has been demonstrated by the examples with the addressed soundness, completeness, and consistency.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

PubMed Central

Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

2013-01-01

The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
Biowep: a workflow enactment portal for bioinformatics applications.

PubMed

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-03-08

The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications

PubMed Central

Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano

2007-01-01

Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
Understanding latent structures of clinical information logistics: A bottom-up approach for model building and validating the workflow composite score.

PubMed

Esdar, Moritz; Hübner, Ursula; Liebe, Jan-David; Hüsers, Jens; Thye, Johannes

2017-01-01

Clinical information logistics is a construct that aims to describe and explain various phenomena of information provision to drive clinical processes. It can be measured by the workflow composite score, an aggregated indicator of the degree of IT support in clinical processes. This study primarily aimed to investigate the yet unknown empirical patterns constituting this construct. The second goal was to derive a data-driven weighting scheme for the constituents of the workflow composite score and to contrast this scheme with a literature based, top-down procedure. This approach should finally test the validity and robustness of the workflow composite score. Based on secondary data from 183 German hospitals, a tiered factor analytic approach (confirmatory and subsequent exploratory factor analysis) was pursued. A weighting scheme, which was based on factor loadings obtained in the analyses, was put into practice. We were able to identify five statistically significant factors of clinical information logistics that accounted for 63% of the overall variance. These factors were "flow of data and information", "mobility", "clinical decision support and patient safety", "electronic patient record" and "integration and distribution". The system of weights derived from the factor loadings resulted in values for the workflow composite score that differed only slightly from the score values that had been previously published based on a top-down approach. Our findings give insight into the internal composition of clinical information logistics both in terms of factors and weights. They also allowed us to propose a coherent model of clinical information logistics from a technical perspective that joins empirical findings with theoretical knowledge. Despite the new scheme of weights applied to the calculation of the workflow composite score, the score behaved robustly, which is yet another hint of its validity and therefore its usefulness. Copyright Â© 2016 Elsevier Ireland Ltd. All rights reserved.
Web service module for access to g-Lite

NASA Astrophysics Data System (ADS)

Goranova, R.; Goranov, G.

2012-10-01

G-Lite is a lightweight grid middleware for grid computing installed on all clusters of the European Grid Infrastructure (EGI). The middleware is partially service-oriented and does not provide well-defined Web services for job management. The existing Web services in the environment cannot be directly used by grid users for building service compositions in the EGI. In this article we present a module of well-defined Web services for job management in the EGI. We describe the architecture of the module and the design of the developed Web services. The presented Web services are composable and can participate in service compositions (workflows). An example of usage of the module with tools for service compositions in g-Lite is shown.
Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

PubMed

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-08-07

There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

PubMed Central

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

2008-01-01

Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127
Implementing bioinformatic workflows within the bioextract server

USDA-ARS?s Scientific Manuscript database

Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline.

PubMed

Thanki, Anil S; Soranzo, Nicola; Haerty, Wilfried; Davey, Robert P

2018-03-01

Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.
chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery.

PubMed

Lee, Man-Ling; Aliagas, Ignacio; Feng, Jianwen A; Gabriel, Thomas; O'Donnell, T J; Sellers, Benjamin D; Wiswedel, Bernd; Gobbi, Alberto

2017-06-12

Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and write SD files from stdin and to stdout to be wrapped into KNIME nodes. The combination of chemalot and chemalot_knime not only facilitates the compilation and maintenance of sequences of command line programs but also allows KNIME workflows to take advantage of the compute power of a LINUX cluster. Use of the command line programs is demonstrated in three different workflow examples: (1) A workflow to create a data file with project-relevant data for structure-activity or property analysis and other type of investigations, (2) The creation of a quantitative structure-property-relationship model using the command line programs via KNIME nodes, and (3) The analysis of strain energy in small molecule ligand conformations from the Protein Data Bank database. The chemalot and chemalot_knime packages provide lightweight and powerful tools for many tasks in cheminformatics. They are easily integrated with other open source and commercial command line tools and can be combined to build new and even more powerful tools. The chemalot_knime package facilitates the generation and maintenance of user-defined command line workflows, taking advantage of the graphical design capabilities in KNIME. Graphical abstract Example KNIME workflow with chemalot nodes and the corresponding command line pipe.

Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.

PubMed

Stockton, David B; Santamaria, Fidel

2017-10-01

We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
Taverna: a tool for building and running workflows of services

PubMed Central

Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

2006-01-01

Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108
Metaworkflows and Workflow Interoperability for Heliophysics

NASA Astrophysics Data System (ADS)

Pierantoni, Gabriele; Carley, Eoin P.

2014-06-01

Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They- implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WSPGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

PubMed

Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

2017-12-01

Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.
MetaboTools: A comprehensive toolbox for analysis of genome-scale metabolic models

DOE PAGES

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

2016-08-03

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
A software tool to analyze clinical workflows from direct observations.

PubMed

Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander

2015-01-01

Observational data of clinical processes need to be managed in a convenient way, so that process information is reliable, valid and viable for further analysis. However, existing tools for allocating observations fail in systematic data collection of specific workflow recordings. We present a software tool which was developed to facilitate the analysis of clinical process observations. The tool was successfully used in the project OntoHealth, to build, store and analyze observations of diabetes routine consultations.
From the desktop to the grid: scalable bioinformatics via workflow conversion.

PubMed

de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver

2016-03-12

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis.

PubMed

Sreedharan, Vipin T; Schultheiss, Sebastian J; Jean, Géraldine; Kahles, André; Bohnert, Regina; Drewe, Philipp; Mudrakarta, Pramod; Görnitz, Nico; Zeller, Georg; Rätsch, Gunnar

2014-05-01

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Tools for automated acoustic monitoring within the R package monitoR

USGS Publications Warehouse

Katz, Jonathan; Hafner, Sasha D.; Donovan, Therese

2016-01-01

The R package monitoR contains tools for managing an acoustic-monitoring program including survey metadata, template creation and manipulation, automated detection and results management. These tools are scalable for use with small projects as well as larger long-term projects and those with expansive spatial extents. Here, we describe typical workflow when using the tools in monitoR. Typical workflow utilizes a generic sequence of functions, with the option for either binary point matching or spectrogram cross-correlation detectors.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, Wucherl; Koo, Michelle; Cao, Yu

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

PubMed

Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

2016-02-04

Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Widening the adoption of workflows to include human and human-machine scientific processes

NASA Astrophysics Data System (ADS)

Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

2010-12-01

Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.
a Standardized Approach to Topographic Data Processing and Workflow Management

NASA Astrophysics Data System (ADS)

Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.

2013-12-01

An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and sharing effective topographic processing workflows.
A Computational Workflow for the Automated Generation of Models of Genetic Designs.

PubMed

Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil

2018-06-05

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.
Pathology economic model tool: a novel approach to workflow and budget cost analysis in an anatomic pathology laboratory.

PubMed

Muirhead, David; Aoun, Patricia; Powell, Michael; Juncker, Flemming; Mollerup, Jens

2010-08-01

The need for higher efficiency, maximum quality, and faster turnaround time is a continuous focus for anatomic pathology laboratories and drives changes in work scheduling, instrumentation, and management control systems. To determine the costs of generating routine, special, and immunohistochemical microscopic slides in a large, academic anatomic pathology laboratory using a top-down approach. The Pathology Economic Model Tool was used to analyze workflow processes at The Nebraska Medical Center's anatomic pathology laboratory. Data from the analysis were used to generate complete cost estimates, which included not only materials, consumables, and instrumentation but also specific labor and overhead components for each of the laboratory's subareas. The cost data generated by the Pathology Economic Model Tool were compared with the cost estimates generated using relative value units. Despite the use of automated systems for different processes, the workflow in the laboratory was found to be relatively labor intensive. The effect of labor and overhead on per-slide costs was significantly underestimated by traditional relative-value unit calculations when compared with the Pathology Economic Model Tool. Specific workflow defects with significant contributions to the cost per slide were identified. The cost of providing routine, special, and immunohistochemical slides may be significantly underestimated by traditional methods that rely on relative value units. Furthermore, a comprehensive analysis may identify specific workflow processes requiring improvement.
A web service for service composition to aid geospatial modelers

NASA Astrophysics Data System (ADS)

Bigagli, L.; Santoro, M.; Roncella, R.; Mazzetti, P.

2012-04-01

The identification of appropriate mechanisms for process reuse, chaining and composition is considered a key enabler for the effective uptake of a global Earth Observation infrastructure, currently pursued by the international geospatial research community. In the Earth and Space Sciences, such a facility could primarily enable integrated and interoperable modeling, for what several approaches have been proposed and developed, over the last years. In fact, GEOSS is specifically tasked with the development of the so-called "Model Web". At increasing levels of abstraction and generalization, the initial stove-pipe software tools have evolved to community-wide modeling frameworks, to Component-Based Architecture solution, and, more recently, started to embrace Service-Oriented Architectures technologies, such as the OGC WPS specification and the WS-* stack of W3C standards for service composition. However, so far, the level of abstraction seems too low for implementing the Model Web vision, and far too complex technological aspects must still be addressed by both providers and users, resulting in limited usability and, eventually, difficult uptake. As by the recent ICT trend of resource virtualization, it has been suggested that users in need of a particular processing capability, required by a given modeling workflow, may benefit from outsourcing the composition activities into an external first-class service, according to the Composition as a Service (CaaS) approach. A CaaS system provides the necessary interoperability service framework for adaptation, reuse and complementation of existing processing resources (including models and geospatial services in general) in the form of executable workflows. This work introduces the architecture of a CaaS system, as a distributed information system for creating, validating, editing, storing, publishing, and executing geospatial workflows. This way, the users can be freed from the need of a composition infrastructure and alleviated from the technicalities of workflow definitions (type matching, identification of external services endpoints, binding issues, etc.) and focus on their intended application. Moreover, the user may submit an incomplete workflow definition, and leverage CaaS recommendations (that may derive from an aggregated knowledge base of user feedback, underpinned by Web 2.0 technologies) to execute it. This is of particular interest for multidisciplinary scientific contexts, where different communities may benefit of each other knowledge through model chaining. Indeed, the CaaS approach is presented as an attempt to combine the recent advances in service-oriented computing with collaborative research principles, and social network information in general. Arguably, it may be considered a fundamental capability of the Model Web. The CaaS concept is being investigated in several application scenarios identified in the FP7 UncertWeb and EuroGEOSS projects. Key aspects of the described CaaS solution are: it provides a standard WPS interface for invoking Business Processes and allows on the fly recursive compositions of Business Processes into other Composite Processes; it is designed according to the extended SOA (broker-based) and the System-of-Systems approach, to support the reuse and integration of existing resources, in compliance with the GEOSS Model Web architecture. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248488.
The medical simulation markup language - simplifying the biomechanical modeling workflow.

PubMed

Suwelack, Stefan; Stoll, Markus; Schalck, Sebastian; Schoch, Nicolai; Dillmann, Rüdiger; Bendl, Rolf; Heuveline, Vincent; Speidel, Stefanie

2014-01-01

Modeling and simulation of the human body by means of continuum mechanics has become an important tool in diagnostics, computer-assisted interventions and training. This modeling approach seeks to construct patient-specific biomechanical models from tomographic data. Usually many different tools such as segmentation and meshing algorithms are involved in this workflow. In this paper we present a generalized and flexible description for biomechanical models. The unique feature of the new modeling language is that it not only describes the final biomechanical simulation, but also the workflow how the biomechanical model is constructed from tomographic data. In this way, the MSML can act as a middleware between all tools used in the modeling pipeline. The MSML thus greatly facilitates the prototyping of medical simulation workflows for clinical and research purposes. In this paper, we not only detail the XML-based modeling scheme, but also present a concrete implementation. Different examples highlight the flexibility, robustness and ease-of-use of the approach.
Comprehensive, powerful, efficient, intuitive: a new software framework for clinical imaging applications

NASA Astrophysics Data System (ADS)

Augustine, Kurt E.; Holmes, David R., III; Hanson, Dennis P.; Robb, Richard A.

2006-03-01

One of the greatest challenges for a software engineer is to create a complex application that is comprehensive enough to be useful to a diverse set of users, yet focused enough for individual tasks to be carried out efficiently with minimal training. This "powerful yet simple" paradox is particularly prevalent in advanced medical imaging applications. Recent research in the Biomedical Imaging Resource (BIR) at Mayo Clinic has been directed toward development of an imaging application framework that provides powerful image visualization/analysis tools in an intuitive, easy-to-use interface. It is based on two concepts very familiar to physicians - Cases and Workflows. Each case is associated with a unique patient and a specific set of routine clinical tasks, or a workflow. Each workflow is comprised of an ordered set of general-purpose modules which can be re-used for each unique workflow. Clinicians help describe and design the workflows, and then are provided with an intuitive interface to both patient data and analysis tools. Since most of the individual steps are common to many different workflows, the use of general-purpose modules reduces development time and results in applications that are consistent, stable, and robust. While the development of individual modules may reflect years of research by imaging scientists, new customized workflows based on the new modules can be developed extremely fast. If a powerful, comprehensive application is difficult to learn and complicated to use, it will be unacceptable to most clinicians. Clinical image analysis tools must be intuitive and effective or they simply will not be used.
Identifying impact of software dependencies on replicability of biomedical workflows.

PubMed

Miksa, Tomasz; Rauber, Andreas; Mina, Eleni

2016-12-01

Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright Â© 2016 Elsevier Inc. All rights reserved.

XML schemas for common bioinformatic data types and their application in workflow systems

PubMed Central

Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

2006-01-01

Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios. PMID:17087823
Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences.

PubMed

Hartman, Amber L; Riddle, Sean; McPhillips, Timothy; Ludäscher, Bertram; Eisen, Jonathan A

2010-06-12

For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.
Sentinel-2 ArcGIS Tool for Environmental Monitoring

NASA Astrophysics Data System (ADS)

Plesoianu, Alin; Cosmin Sandric, Ionut; Anca, Paula; Vasile, Alexandru; Calugaru, Andreea; Vasile, Cristian; Zavate, Lucian

2017-04-01

This paper addresses one of the biggest challenges regarding Sentinel-2 data, related to the need of an efficient tool to access and process the large collection of images that are available. Consequently, developing a tool for the automation of Sentinel-2 data analysis is the most immediate need. We developed a series of tools for the automation of Sentinel-2 data download and processing for vegetation health monitoring. The tools automatically perform the following operations: downloading image tiles from ESA's Scientific Hub or other venders (Amazon), pre-processing of the images to extract the 10-m bands, creating image composites, applying a series of vegetation indexes (NDVI, OSAVI, etc.) and performing change detection analyses on different temporal data sets. All of these tools run in a dynamic way in the ArcGIS Platform, without the need of creating intermediate datasets (rasters, layers), as the images are processed on-the-fly in order to avoid data duplication. Finally, they allow complete integration with the ArcGIS environment and workflows
MBAT: a scalable informatics system for unifying digital atlasing workflows.

PubMed

Lee, Daren; Ruffins, Seth; Ng, Queenie; Sane, Nikhil; Anderson, Steve; Toga, Arthur

2010-12-22

Digital atlases provide a common semantic and spatial coordinate system that can be leveraged to compare, contrast, and correlate data from disparate sources. As the quality and amount of biological data continues to advance and grow, searching, referencing, and comparing this data with a researcher's own data is essential. However, the integration process is cumbersome and time-consuming due to misaligned data, implicitly defined associations, and incompatible data sources. This work addressing these challenges by providing a unified and adaptable environment to accelerate the workflow to gather, align, and analyze the data. The MouseBIRN Atlasing Toolkit (MBAT) project was developed as a cross-platform, free open-source application that unifies and accelerates the digital atlas workflow. A tiered, plug-in architecture was designed for the neuroinformatics and genomics goals of the project to provide a modular and extensible design. MBAT provides the ability to use a single query to search and retrieve data from multiple data sources, align image data using the user's preferred registration method, composite data from multiple sources in a common space, and link relevant informatics information to the current view of the data or atlas. The workspaces leverage tool plug-ins to extend and allow future extensions of the basic workspace functionality. A wide variety of tool plug-ins were developed that integrate pre-existing as well as newly created technology into each workspace. Novel atlasing features were also developed, such as supporting multiple label sets, dynamic selection and grouping of labels, and synchronized, context-driven display of ontological data. MBAT empowers researchers to discover correlations among disparate data by providing a unified environment for bringing together distributed reference resources, a user's image data, and biological atlases into the same spatial or semantic context. Through its extensible tiered plug-in architecture, MBAT allows researchers to customize all platform components to quickly achieve personalized workflows.
SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences.

PubMed

Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M

2008-08-08

Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
PATHA: Performance Analysis Tool for HPC Applications

DOE PAGES

Yoo, Wucherl; Koo, Michelle; Cao, Yi; ...

2016-02-18

Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. In order to address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data.more » Furthermore, it utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). This study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.« less
CMS Configuration Editor: GUI based application for user analysis job

NASA Astrophysics Data System (ADS)

de Cosa, A.

2011-12-01

We present the user interface and the software architecture of the Configuration Editor for the CMS experiment. The analysis workflow is organized in a modular way integrated within the CMS framework that organizes in a flexible way user analysis code. The Python scripting language is adopted to define the job configuration that drives the analysis workflow. It could be a challenging task for users, especially for newcomers, to develop analysis jobs managing the configuration of many required modules. For this reason a graphical tool has been conceived in order to edit and inspect configuration files. A set of common analysis tools defined in the CMS Physics Analysis Toolkit (PAT) can be steered and configured using the Config Editor. A user-defined analysis workflow can be produced starting from a standard configuration file, applying and configuring PAT tools according to the specific user requirements. CMS users can adopt this tool, the Config Editor, to create their analysis visualizing in real time which are the effects of their actions. They can visualize the structure of their configuration, look at the modules included in the workflow, inspect the dependences existing among the modules and check the data flow. They can visualize at which values parameters are set and change them according to what is required by their analysis task. The integration of common tools in the GUI needed to adopt an object-oriented structure in the Python definition of the PAT tools and the definition of a layer of abstraction from which all PAT tools inherit.
Workflow based framework for life science informatics.

PubMed

Tiwari, Abhishek; Sekhar, Arvind K T

2007-10-01

Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data.

PubMed

Davidson, Robert L; Weber, Ralf J M; Liu, Haoyu; Sharma-Oates, Archana; Viant, Mark R

2016-01-01

Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.
Successful Completion of FY18/Q1 ASC L2 Milestone 6355: Electrical Analysis Calibration Workflow Capability Demonstration.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Copps, Kevin D.

The Sandia Analysis Workbench (SAW) project has developed and deployed a production capability for SIERRA computational mechanics analysis workflows. However, the electrical analysis workflow capability requirements have only been demonstrated in early prototype states, with no real capability deployed for analysts’ use. This milestone aims to improve the electrical analysis workflow capability (via SAW and related tools) and deploy it for ongoing use. We propose to focus on a QASPR electrical analysis calibration workflow use case. We will include a number of new capabilities (versus today’s SAW), such as: 1) support for the XYCE code workflow component, 2) data managementmore » coupled to electrical workflow, 3) human-in-theloop workflow capability, and 4) electrical analysis workflow capability deployed on the restricted (and possibly classified) network at Sandia. While far from the complete set of capabilities required for electrical analysis workflow over the long term, this is a substantial first step toward full production support for the electrical analysts.« less
Usability Testing of a National Substance Use Screening Tool Embedded in Electronic Health Records.

PubMed

Press, Anne; DeStio, Catherine; McCullagh, Lauren; Kapoor, Sandeep; Morley, Jeanne; Conigliaro, Joseph

2016-07-08

Screening, brief intervention, and referral to treatment (SBIRT) is currently being implemented into health systems nationally via paper and electronic methods. The purpose of this study was to evaluate the integration of an electronic SBIRT tool into an existing paper-based SBIRT clinical workflow in a patient-centered medical home. Usability testing was conducted in an academic ambulatory clinic. Two rounds of usability testing were done with medical office assistants (MOAs) using a paper and electronic version of the SBIRT tool, with two and four participants, respectively. Qualitative and quantitative data was analyzed to determine the impact of both tools on clinical workflow. A second round of usability testing was done with the revised electronic version and compared with the first version. Personal workflow barriers cited in the first round of testing were that the electronic health record (EHR) tool was disruptive to patient's visits. In Round 2 of testing, MOAs reported favoring the electronic version due to improved layout and the inclusion of an alert system embedded in the EHR. For example, using the system usability scale (SUS), MOAs reported a grade "1" for the statement, "I would like to use this system frequently" during the first round of testing but a "5" during the second round of analysis. The importance of testing usability of various mediums of tools used in health care screening is highlighted by the findings of this study. In the first round of testing, the electronic tool was reported as less user friendly, being difficult to navigate, and time consuming. Many issues faced in the first generation of the tool were improved in the second generation after usability was evaluated. This study demonstrates how usability testing of an electronic SBRIT tool can help to identify challenges that can impact clinical workflow. However, a limitation of this study was the small sample size of MOAs that participated. The results may have been biased to Northwell Health workers' perceptions of the SBIRT tool and their specific clinical workflow.
Observing health professionals' workflow patterns for diabetes care - First steps towards an ontology for EHR services.

PubMed

Schweitzer, M; Lasierra, N; Hoerbst, A

2015-01-01

Increasing the flexibility from a user-perspective and enabling a workflow based interaction, facilitates an easy user-friendly utilization of EHRs for healthcare professionals' daily work. To offer such versatile EHR-functionality, our approach is based on the execution of clinical workflows by means of a composition of semantic web-services. The backbone of such architecture is an ontology which enables to represent clinical workflows and facilitates the selection of suitable services. In this paper we present the methods and results after running observations of diabetes routine consultations which were conducted in order to identify those workflows and the relation among the included tasks. Mentioned workflows were first modeled by BPMN and then generalized. As a following step in our study, interviews will be conducted with clinical personnel to validate modeled workflows.
MassCascade: Visual Programming for LC-MS Data Processing in Metabolomics.

PubMed

Beisken, Stephan; Earll, Mark; Portwood, David; Seymour, Mark; Steinbeck, Christoph

2014-04-01

Liquid chromatography coupled to mass spectrometry (LC-MS) is commonly applied to investigate the small molecule complement of organisms. Several software tools are typically joined in custom pipelines to semi-automatically process and analyse the resulting data. General workflow environments like the Konstanz Information Miner (KNIME) offer the potential of an all-in-one solution to process LC-MS data by allowing easy integration of different tools and scripts. We describe MassCascade and its workflow plug-in for processing LC-MS data. The Java library integrates frequently used algorithms in a modular fashion, thus enabling it to serve as back-end for graphical front-ends. The functions available in MassCascade have been encapsulated in a plug-in for the workflow environment KNIME, allowing combined use with e.g. statistical workflow nodes from other providers and making the tool intuitive to use without knowledge of programming. The design of the software guarantees a high level of modularity where processing functions can be quickly replaced or concatenated. MassCascade is an open-source library for LC-MS data processing in metabolomics. It embraces the concept of visual programming through its KNIME plug-in, simplifying the process of building complex workflows. The library was validated using open data.
Modeling workflow to design machine translation applications for public health practice

PubMed Central

Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin

2014-01-01

Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API.

PubMed

Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry

2015-01-01

As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST's API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014.
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API

PubMed Central

Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry

2015-01-01

As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST’s API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014. PMID:26672762
Flexible workflow sharing and execution services for e-scientists

NASA Astrophysics Data System (ADS)

Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely

2013-04-01

The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

PubMed Central

Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

2016-01-01

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062
Vel-IO 3D: A tool for 3D velocity model construction, optimization and time-depth conversion in 3D geological modeling workflow

NASA Astrophysics Data System (ADS)

Maesano, Francesco E.; D'Ambrogi, Chiara

2017-02-01

We present Vel-IO 3D, a tool for 3D velocity model creation and time-depth conversion, as part of a workflow for 3D model building. The workflow addresses the management of large subsurface dataset, mainly seismic lines and well logs, and the construction of a 3D velocity model able to describe the variation of the velocity parameters related to strong facies and thickness variability and to high structural complexity. Although it is applicable in many geological contexts (e.g. foreland basins, large intermountain basins), it is particularly suitable in wide flat regions, where subsurface structures have no surface expression. The Vel-IO 3D tool is composed by three scripts, written in Python 2.7.11, that automate i) the 3D instantaneous velocity model building, ii) the velocity model optimization, iii) the time-depth conversion. They determine a 3D geological model that is consistent with the primary geological constraints (e.g. depth of the markers on wells). The proposed workflow and the Vel-IO 3D tool have been tested, during the EU funded Project GeoMol, by the construction of the 3D geological model of a flat region, 5700 km2 in area, located in the central part of the Po Plain. The final 3D model showed the efficiency of the workflow and Vel-IO 3D tool in the management of large amount of data both in time and depth domain. A 4 layer-cake velocity model has been applied to a several thousand (5000-13,000 m) thick succession, with 15 horizons from Triassic up to Pleistocene, complicated by a Mesozoic extensional tectonics and by buried thrusts related to Southern Alps and Northern Apennines.
Integrating Visualizations into Modeling NEST Simulations

PubMed Central

Nowke, Christian; Zielasko, Daniel; Weyers, Benjamin; Peyser, Alexander; Hentschel, Bernd; Kuhlen, Torsten W.

2015-01-01

Modeling large-scale spiking neural networks showing realistic biological behavior in their dynamics is a complex and tedious task. Since these networks consist of millions of interconnected neurons, their simulation produces an immense amount of data. In recent years it has become possible to simulate even larger networks. However, solutions to assist researchers in understanding the simulation's complex emergent behavior by means of visualization are still lacking. While developing tools to partially fill this gap, we encountered the challenge to integrate these tools easily into the neuroscientists' daily workflow. To understand what makes this so challenging, we looked into the workflows of our collaborators and analyzed how they use the visualizations to solve their daily problems. We identified two major issues: first, the analysis process can rapidly change focus which requires to switch the visualization tool that assists in the current problem domain. Second, because of the heterogeneous data that results from simulations, researchers want to relate data to investigate these effectively. Since a monolithic application model, processing and visualizing all data modalities and reflecting all combinations of possible workflows in a holistic way, is most likely impossible to develop and to maintain, a software architecture that offers specialized visualization tools that run simultaneously and can be linked together to reflect the current workflow, is a more feasible approach. To this end, we have developed a software architecture that allows neuroscientists to integrate visualization tools more closely into the modeling tasks. In addition, it forms the basis for semantic linking of different visualizations to reflect the current workflow. In this paper, we present this architecture and substantiate the usefulness of our approach by common use cases we encountered in our collaborative work. PMID:26733860

XML schemas for common bioinformatic data types and their application in workflow systems.

PubMed

Seibel, Philipp N; Krüger, Jan; Hartmeier, Sven; Schwarzer, Knut; Löwenthal, Kai; Mersch, Henning; Dandekar, Thomas; Giegerich, Robert

2006-11-06

Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data--therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at http://bioschemas.sourceforge.net, the BioDOM library can be obtained at http://biodom.sourceforge.net. The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Executing SADI services in Galaxy.

PubMed

Aranguren, Mikel Egaña; González, Alejandro Rodríguez; Wilkinson, Mark D

2014-01-01

In recent years Galaxy has become a popular workflow management system in bioinformatics, due to its ease of installation, use and extension. The availability of Semantic Web-oriented tools in Galaxy, however, is limited. This is also the case for Semantic Web Services such as those provided by the SADI project, i.e. services that consume and produce RDF. Here we present SADI-Galaxy, a tool generator that deploys selected SADI Services as typical Galaxy tools. SADI-Galaxy is a Galaxy tool generator: through SADI-Galaxy, any SADI-compliant service becomes a Galaxy tool that can participate in other out-standing features of Galaxy such as data storage, history, workflow creation, and publication. Galaxy can also be used to execute and combine SADI services as it does with other Galaxy tools. Finally, we have semi-automated the packing and unpacking of data into RDF such that other Galaxy tools can easily be combined with SADI services, plugging the rich SADI Semantic Web Service environment into the popular Galaxy ecosystem. SADI-Galaxy bridges the gap between Galaxy, an easy to use but "static" workflow system with a wide user-base, and SADI, a sophisticated, semantic, discovery-based framework for Web Services, thus benefiting both user communities.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Building asynchronous geospatial processing workflows with web services

NASA Astrophysics Data System (ADS)

Zhao, Peisheng; Di, Liping; Yu, Genong

2012-02-01

Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

PubMed

Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

PubMed Central

Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

2015-01-01

Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
Agile parallel bioinformatics workflow management using Pwrake.

PubMed

Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

2011-09-08

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake

PubMed Central

2011-01-01

Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

PubMed Central

Stockton, David B.; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

PubMed

Stockton, David B; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ludascher, Bertram; Altintas, Ilkay

Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

NASA Astrophysics Data System (ADS)

Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

2011-12-01

Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

PubMed Central

Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

2016-01-01

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971
RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS.

PubMed

Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz

2017-01-01

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.
The future of scientific workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deelman, Ewa; Peterka, Tom; Altintas, Ilkay

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
Workflows for microarray data processing in the Kepler environment.

PubMed

Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

2012-05-17

Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology

NASA Astrophysics Data System (ADS)

Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm

2015-04-01

Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
A recommended workflow methodology in the creation of an educational and training application incorporating a digital reconstruction of the cerebral ventricular system and cerebrospinal fluid circulation to aid anatomical understanding.

PubMed

Manson, Amy; Poyade, Matthieu; Rea, Paul

2015-10-19

The use of computer-aided learning in education can be advantageous, especially when interactive three-dimensional (3D) models are used to aid learning of complex 3D structures. The anatomy of the ventricular system of the brain is difficult to fully understand as it is seldom seen in 3D, as is the flow of cerebrospinal fluid (CSF). This article outlines a workflow for the creation of an interactive training tool for the cerebral ventricular system, an educationally challenging area of anatomy. This outline is based on the use of widely available computer software packages. Using MR images of the cerebral ventricular system and several widely available commercial and free software packages, the techniques of 3D modelling, texturing, sculpting, image editing and animations were combined to create a workflow in the creation of an interactive educational and training tool. This was focussed on cerebral ventricular system anatomy, and the flow of cerebrospinal fluid. We have successfully created a robust methodology by using key software packages in the creation of an interactive education and training tool. This has resulted in an application being developed which details the anatomy of the ventricular system, and flow of cerebrospinal fluid using an anatomically accurate 3D model. In addition to this, our established workflow pattern presented here also shows how tutorials, animations and self-assessment tools can also be embedded into the training application. Through our creation of an established workflow in the generation of educational and training material for demonstrating cerebral ventricular anatomy and flow of cerebrospinal fluid, it has enormous potential to be adopted into student training in this field. With the digital age advancing rapidly, this has the potential to be used as an innovative tool alongside other methodologies for the training of future healthcare practitioners and scientists. This workflow could be used in the creation of other tools, which could be developed for use not only on desktop and laptop computers but also smartphones, tablets and fully immersive stereoscopic environments. It also could form the basis on which to build surgical simulations enhanced with haptic interaction.
AstroGrid: Taverna in the Virtual Observatory .

NASA Astrophysics Data System (ADS)

Benson, K. M.; Walton, N. A.

This paper reports on the implementation of the Taverna workbench by AstroGrid, a tool for designing and executing workflows of tasks in the Virtual Observatory. The workflow approach helps astronomers perform complex task sequences with little technical effort. Visual approach to workflow construction streamlines highly complex analysis over public and private data and uses computational resources as minimal as a desktop computer. Some integration issues and future work are discussed in this article.
MetaNET--a web-accessible interactive platform for biological metabolic network analysis.

PubMed

Narang, Pankaj; Khan, Shawez; Hemrom, Anmol Jaywant; Lynn, Andrew Michael

2014-01-01

Metabolic reactions have been extensively studied and compiled over the last century. These have provided a theoretical base to implement models, simulations of which are used to identify drug targets and optimize metabolic throughput at a systemic level. While tools for the perturbation of metabolic networks are available, their applications are limited and restricted as they require varied dependencies and often a commercial platform for full functionality. We have developed MetaNET, an open source user-friendly platform-independent and web-accessible resource consisting of several pre-defined workflows for metabolic network analysis. MetaNET is a web-accessible platform that incorporates a range of functions which can be combined to produce different simulations related to metabolic networks. These include (i) optimization of an objective function for wild type strain, gene/catalyst/reaction knock-out/knock-down analysis using flux balance analysis. (ii) flux variability analysis (iii) chemical species participation (iv) cycles and extreme paths identification and (v) choke point reaction analysis to facilitate identification of potential drug targets. The platform is built using custom scripts along with the open-source Galaxy workflow and Systems Biology Research Tool as components. Pre-defined workflows are available for common processes, and an exhaustive list of over 50 functions are provided for user defined workflows. MetaNET, available at http://metanet.osdd.net , provides a user-friendly rich interface allowing the analysis of genome-scale metabolic networks under various genetic and environmental conditions. The framework permits the storage of previous results, the ability to repeat analysis and share results with other users over the internet as well as run different tools simultaneously using pre-defined workflows, and user-created custom workflows.

RetroPath2.0: A retrosynthesis workflow for metabolic engineers.

PubMed

Delépine, Baudoin; Duigou, Thomas; Carbonell, Pablo; Faulon, Jean-Loup

2018-01-01

Synthetic biology applied to industrial biotechnology is transforming the way we produce chemicals. However, despite advances in the scale and scope of metabolic engineering, the research and development process still remains costly. In order to expand the chemical repertoire for the production of next generation compounds, a major engineering biology effort is required in the development of novel design tools that target chemical diversity through rapid and predictable protocols. Addressing that goal involves retrosynthesis approaches that explore the chemical biosynthetic space. However, the complexity associated with the large combinatorial retrosynthesis design space has often been recognized as the main challenge hindering the approach. Here, we provide RetroPath2.0, an automated open source workflow for retrosynthesis based on generalized reaction rules that perform the retrosynthesis search from chassis to target through an efficient and well-controlled protocol. Its easiness of use and the versatility of its applications make this tool a valuable addition to the biological engineer bench desk. We show through several examples the application of the workflow to biotechnological relevant problems, including the identification of alternative biosynthetic routes through enzyme promiscuity or the development of biosensors. We demonstrate in that way the ability of the workflow to streamline retrosynthesis pathway design and its major role in reshaping the design, build, test and learn pipeline by driving the process toward the objective of optimizing bioproduction. The RetroPath2.0 workflow is built using tools developed by the bioinformatics and cheminformatics community, because it is open source we anticipate community contributions will likely expand further the features of the workflow. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
E-Science technologies in a workflow for personalized medicine using cancer screening as a case study.

PubMed

Spjuth, Ola; Karlsson, Andreas; Clements, Mark; Humphreys, Keith; Ivansson, Emma; Dowling, Jim; Eklund, Martin; Jauhiainen, Alexandra; Czene, Kamila; Grönberg, Henrik; Sparén, Pär; Wiklund, Fredrik; Cheddad, Abbas; Pálsdóttir, Þorgerður; Rantalainen, Mattias; Abrahamsson, Linda; Laure, Erwin; Litton, Jan-Eric; Palmgren, Juni

2017-09-01

We provide an e-Science perspective on the workflow from risk factor discovery and classification of disease to evaluation of personalized intervention programs. As case studies, we use personalized prostate and breast cancer screenings. We describe an e-Science initiative in Sweden, e-Science for Cancer Prevention and Control (eCPC), which supports biomarker discovery and offers decision support for personalized intervention strategies. The generic eCPC contribution is a workflow with 4 nodes applied iteratively, and the concept of e-Science signifies systematic use of tools from the mathematical, statistical, data, and computer sciences. The eCPC workflow is illustrated through 2 case studies. For prostate cancer, an in-house personalized screening tool, the Stockholm-3 model (S3M), is presented as an alternative to prostate-specific antigen testing alone. S3M is evaluated in a trial setting and plans for rollout in the population are discussed. For breast cancer, new biomarkers based on breast density and molecular profiles are developed and the US multicenter Women Informed to Screen Depending on Measures (WISDOM) trial is referred to for evaluation. While current eCPC data management uses a traditional data warehouse model, we discuss eCPC-developed features of a coherent data integration platform. E-Science tools are a key part of an evidence-based process for personalized medicine. This paper provides a structured workflow from data and models to evaluation of new personalized intervention strategies. The importance of multidisciplinary collaboration is emphasized. Importantly, the generic concepts of the suggested eCPC workflow are transferrable to other disease domains, although each disease will require tailored solutions. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

NASA Astrophysics Data System (ADS)

Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.

2015-12-01

The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2014-12-01

The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
FluxCTTX: A LIMS-based tool for management and analysis of cytotoxicity assays data

PubMed Central

2015-01-01

Background Cytotoxicity assays have been used by researchers to screen for cytotoxicity in compound libraries. Researchers can either look for cytotoxic compounds or screen "hits" from initial high-throughput drug screens for unwanted cytotoxic effects before investing in their development as a pharmaceutical. These assays may be used as an alternative to animal experimentation and are becoming increasingly important in modern laboratories. However, the execution of these assays in large scale and different laboratories requires, among other things, the management of protocols, reagents, cell lines used as well as the data produced, which can be a challenge. The management of all this information is greatly improved by the utilization of computational tools to save time and guarantee quality. However, a tool that performs this task designed specifically for cytotoxicity assays is not yet available. Results In this work, we have used a workflow based LIMS -- the Flux system -- and the Together Workflow Editor as a framework to develop FluxCTTX, a tool for management of data from cytotoxicity assays performed at different laboratories. The main work is the development of a workflow, which represents all stages of the assay and has been developed and uploaded in Flux. This workflow models the activities of cytotoxicity assays performed as described in the OECD 129 Guidance Document. Conclusions FluxCTTX presents a solution for the management of the data produced by cytotoxicity assays performed at Interlaboratory comparisons. Its adoption will contribute to guarantee the quality of activities in the process of cytotoxicity tests and enforce the use of Good Laboratory Practices (GLP). Furthermore, the workflow developed is complete and can be adapted to other contexts and different tests for management of other types of data. PMID:26696462
A Comprehensive Automated 3D Approach for Building Extraction, Reconstruction, and Regularization from Airborne Laser Scanning Point Clouds

PubMed Central

Dorninger, Peter; Pfeifer, Norbert

2008-01-01

Three dimensional city models are necessary for supporting numerous management applications. For the determination of city models for visualization purposes, several standardized workflows do exist. They are either based on photogrammetry or on LiDAR or on a combination of both data acquisition techniques. However, the automated determination of reliable and highly accurate city models is still a challenging task, requiring a workflow comprising several processing steps. The most relevant are building detection, building outline generation, building modeling, and finally, building quality analysis. Commercial software tools for building modeling require, generally, a high degree of human interaction and most automated approaches described in literature stress the steps of such a workflow individually. In this article, we propose a comprehensive approach for automated determination of 3D city models from airborne acquired point cloud data. It is based on the assumption that individual buildings can be modeled properly by a composition of a set of planar faces. Hence, it is based on a reliable 3D segmentation algorithm, detecting planar faces in a point cloud. This segmentation is of crucial importance for the outline detection and for the modeling approach. We describe the theoretical background, the segmentation algorithm, the outline detection, and the modeling approach, and we present and discuss several actual projects. PMID:27873931
The PEcAn Project: Accessible Tools for On-demand Ecosystem Modeling

NASA Astrophysics Data System (ADS)

Cowdery, E.; Kooper, R.; LeBauer, D.; Desai, A. R.; Mantooth, J.; Dietze, M.

2014-12-01

Ecosystem models play a critical role in understanding the terrestrial biosphere and forecasting changes in the carbon cycle, however current forecasts have considerable uncertainty. The amount of data being collected and produced is increasing on daily basis as we enter the "big data" era, but only a fraction of this data is being used to constrain models. Until we can improve the problems of model accessibility and model-data communication, none of these resources can be used to their full potential. The Predictive Ecosystem Analyzer (PEcAn) is an ecoinformatics toolbox and a set of workflows that wrap around an ecosystem model and manage the flow of information in and out of regional-scale TBMs. Here we present new modules developed in PEcAn to manage the processing of meteorological data, one of the primary driver dependencies for ecosystem models. The module downloads, reads, extracts, and converts meteorological observations to Unidata Climate Forecast (CF) NetCDF community standard, a convention used for most climate forecast and weather models. The module also automates the conversion from NetCDF to model specific formats, including basic merging, gap-filling, and downscaling procedures. PEcAn currently supports tower-based micrometeorological observations at Ameriflux and FluxNET sites, site-level CSV-formatted data, and regional and global reanalysis products such as the North American Regional Reanalysis and CRU-NCEP. The workflow is easily extensible to additional products and processing algorithms.These meteorological workflows have been coupled with the PEcAn web interface and now allow anyone to run multiple ecosystem models for any location on the Earth by simply clicking on an intuitive Google-map based interface. This will allow users to more readily compare models to observations at those sites, leading to better calibration and validation. Current work is extending these workflows to also process field, remotely-sensed, and historical observations of vegetation composition and structure. The processing of heterogeneous met and veg data within PEcAn is made possible using the Brown Dog cyberinfrastructure tools for unstructured data.
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework

PubMed Central

Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A.; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E.; Hervey, W. Judson; McGowan, Thomas; Muth, Thilo; Rudney, Joel; Griffin, Timothy J.

2018-01-01

The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest“ undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software. PMID:29385081
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework.

PubMed

Blank, Clemens; Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E; Hervey, W Judson; McGowan, Thomas; Muth, Thilo; Nunn, Brook; Rudney, Joel; Tanca, Alessandro; Griffin, Timothy J; Jagtap, Pratik D

2018-01-31

The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
SemanticSCo: A platform to support the semantic composition of services for gene expression analysis.

PubMed

Guardia, Gabriela D A; Ferreira Pires, Luís; da Silva, Eduardo G; de Farias, Cléver R G

2017-02-01

Gene expression studies often require the combined use of a number of analysis tools. However, manual integration of analysis tools can be cumbersome and error prone. To support a higher level of automation in the integration process, efforts have been made in the biomedical domain towards the development of semantic web services and supporting composition environments. Yet, most environments consider only the execution of simple service behaviours and requires users to focus on technical details of the composition process. We propose a novel approach to the semantic composition of gene expression analysis services that addresses the shortcomings of the existing solutions. Our approach includes an architecture designed to support the service composition process for gene expression analysis, and a flexible strategy for the (semi) automatic composition of semantic web services. Finally, we implement a supporting platform called SemanticSCo to realize the proposed composition approach and demonstrate its functionality by successfully reproducing a microarray study documented in the literature. The SemanticSCo platform provides support for the composition of RESTful web services semantically annotated using SAWSDL. Our platform also supports the definition of constraints/conditions regarding the order in which service operations should be invoked, thus enabling the definition of complex service behaviours. Our proposed solution for semantic web service composition takes into account the requirements of different stakeholders and addresses all phases of the service composition process. It also provides support for the definition of analysis workflows at a high-level of abstraction, thus enabling users to focus on biological research issues rather than on the technical details of the composition process. The SemanticSCo source code is available at https://github.com/usplssb/SemanticSCo. Copyright Â© 2017 Elsevier Inc. All rights reserved.
3D workflow for HDR image capture of projection systems and objects for CAVE virtual environments authoring with wireless touch-sensitive devices

NASA Astrophysics Data System (ADS)

Prusten, Mark J.; McIntyre, Michelle; Landis, Marvin

2006-02-01

A 3D workflow pipeline is presented for High Dynamic Range (HDR) image capture of projected scenes or objects for presentation in CAVE virtual environments. The methods of HDR digital photography of environments vs. objects are reviewed. Samples of both types of virtual authoring being the actual CAVE environment and a sculpture are shown. A series of software tools are incorporated into a pipeline called CAVEPIPE, allowing for high-resolution objects and scenes to be composited together in natural illumination environments [1] and presented in our CAVE virtual reality environment. We also present a way to enhance the user interface for CAVE environments. The traditional methods of controlling the navigation through virtual environments include: glove, HUD's and 3D mouse devices. By integrating a wireless network that includes both WiFi (IEEE 802.11b/g) and Bluetooth (IEEE 802.15.1) protocols the non-graphical input control device can be eliminated. Therefore wireless devices can be added that would include: PDA's, Smart Phones, TabletPC's, Portable Gaming consoles, and PocketPC's.
Five task clusters that enable efficient and effective digitization of biological collections

PubMed Central

Nelson, Gil; Paul, Deborah; Riccardi, Gregory; Mast, Austin R.

2012-01-01

Abstract This paper describes and illustrates five major clusters of related tasks (herein referred to as task clusters) that are common to efficient and effective practices in the digitization of biological specimen data and media. Examples of these clusters come from the observation of diverse digitization processes. The staff of iDigBio (The U.S. National Science Foundation’s National Resource for Advancing Digitization of Biological Collections) visited active biological and paleontological collections digitization programs for the purpose of documenting and assessing current digitization practices and tools. These observations identified five task clusters that comprise the digitization process leading up to data publication: (1) pre-digitization curation and staging, (2) specimen image capture, (3) specimen image processing, (4) electronic data capture, and (5) georeferencing locality descriptions. While not all institutions are completing each of these task clusters for each specimen, these clusters describe a composite picture of digitization of biological and paleontological specimens across the programs that were observed. We describe these clusters, three workflow patterns that dominate the implemention of these clusters, and offer a set of workflow recommendations for digitization programs. PMID:22859876
Data Integration Tool: From Permafrost Data Translation Research Tool to A Robust Research Application

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Strawhacker, C.; Pulsifer, P. L.; Thurmes, N.

2016-12-01

The United States National Science Foundation funded PermaData project led by the National Snow and Ice Data Center (NSIDC) with a team from the Global Terrestrial Network for Permafrost (GTN-P) aimed to improve permafrost data access and discovery. We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the GTN-P. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets. Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs. Originally it was written to capture a scientist's personal, iterative, data manipulation and quality control process of visually and programmatically iterating through inconsistent input data, examining it to find problems, adding operations to address the problems, and rerunning until the data could be translated into the GTN-P standard format. Iterative development of this tool led to a Fortran/Python hybrid then, with consideration of users, licensing, version control, packaging, and workflow, to a publically available, robust, usable application. Transitioning to Python allowed the use of open source frameworks for the workflow core and integration with a javascript graphical workflow interface. DIT is targeted to automatically handle 90% of the data processing for field scientists, modelers, and non-discipline scientists. It is available as an open source tool in GitHub packaged for a subset of Mac, Windows, and UNIX systems as a desktop application with a graphical workflow manager. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.
Workflow technology: the new frontier. How to overcome the barriers and join the future.

PubMed

Shefter, Susan M

2006-01-01

Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.
Scientific Workflow Management in Proteomics

PubMed Central

de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

2012-01-01

Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

NASA Astrophysics Data System (ADS)

Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

2017-04-01

Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).
Using bio.tools to generate and annotate workbench tool descriptions

PubMed Central

Hillion, Kenzo-Hugo; Kuzmin, Ivan; Khodak, Anton; Rasche, Eric; Crusoe, Michael; Peterson, Hedi; Ison, Jon; Ménager, Hervé

2017-01-01

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata. PMID:29333231
BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.

PubMed

Hardisty, Alex R; Bacall, Finn; Beard, Niall; Balcázar-Vargas, Maria-Paula; Balech, Bachir; Barcza, Zoltán; Bourlat, Sarah J; De Giovanni, Renato; de Jong, Yde; De Leo, Francesca; Dobor, Laura; Donvito, Giacinto; Fellows, Donal; Guerra, Antonio Fernandez; Ferreira, Nuno; Fetyukova, Yuliya; Fosso, Bruno; Giddy, Jonathan; Goble, Carole; Güntsch, Anton; Haines, Robert; Ernst, Vera Hernández; Hettling, Hannes; Hidy, Dóra; Horváth, Ferenc; Ittzés, Dóra; Ittzés, Péter; Jones, Andrew; Kottmann, Renzo; Kulawik, Robert; Leidenberger, Sonja; Lyytikäinen-Saarenmaa, Päivi; Mathew, Cherian; Morrison, Norman; Nenadic, Aleksandra; de la Hidalga, Abraham Nieva; Obst, Matthias; Oostermeijer, Gerard; Paymal, Elisabeth; Pesole, Graziano; Pinto, Salvatore; Poigné, Axel; Fernandez, Francisco Quevedo; Santamaria, Monica; Saarenmaa, Hannu; Sipos, Gergely; Sylla, Karl-Heinz; Tähtinen, Marko; Vicario, Saverio; Vos, Rutger Aldo; Williams, Alan R; Yilmaz, Pelin

2016-10-20

Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.
Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network

NASA Astrophysics Data System (ADS)

Jones, A. S.; Horsburgh, J. S.

2014-12-01

Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The variety of environmental sensors and the multi-watershed, multi-institutional nature of the network necessitate a well-planned and efficient workflow for acquiring, managing, and sharing sensor data, which should be useful for similar large-scale and long-term networks.
Enhancing population pharmacokinetic modeling efficiency and quality using an integrated workflow.

PubMed

Schmidt, Henning; Radivojevic, Andrijana

2014-08-01

Population pharmacokinetic (popPK) analyses are at the core of Pharmacometrics and need to be performed regularly. Although these analyses are relatively standard, a large variability can be observed in both the time (efficiency) and the way they are performed (quality). Main reasons for this variability include the level of experience of a modeler, personal preferences and tools. This paper aims to examine how the process of popPK model building can be supported in order to increase its efficiency and quality. The presented approach to the conduct of popPK analyses is centered around three key components: (1) identification of most common and important popPK model features, (2) required information content and formatting of the data for modeling, and (3) methodology, workflow and workflow supporting tools. This approach has been used in several popPK modeling projects and a documented example is provided in the supplementary material. Efficiency of model building is improved by avoiding repetitive coding and other labor-intensive tasks and by putting the emphasis on a fit-for-purpose model. Quality is improved by ensuring that the workflow and tools are in alignment with a popPK modeling guidance which is established within an organization. The main conclusion of this paper is that workflow based approaches to popPK modeling are feasible and have significant potential to ameliorate its various aspects. However, the implementation of such an approach in a pharmacometric organization requires openness towards innovation and change-the key ingredient for evolution of integrative and quantitative drug development in the pharmaceutical industry.

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

PubMed

Lu, Zhiyong; Hirschman, Lynette

2012-01-01

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

PubMed Central

Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495
Using workflows to explore and optimise named entity recognition for chemistry.

PubMed

Kolluru, Balakrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.
Sustaining an Online, Shared Community Resource for Models, Robust Open source Software Tools and Data for Volcanology - the Vhub Experience

NASA Astrophysics Data System (ADS)

Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.

2015-12-01

Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical models.
New ArcGIS tools developed for stream network extraction and basin delineations using Python and java script

NASA Astrophysics Data System (ADS)

Omran, Adel; Dietrich, Schröder; Abouelmagd, Abdou; Michael, Märker

2016-09-01

Damages caused by flash floods hazards are an increasing phenomenon, especially in arid and semi-arid areas. Thus, the need to evaluate these areas based on their flash flood risk using maps and hydrological models is also becoming more important. For ungauged watersheds a tentative analysis can be carried out based on the geomorphometric characteristics of the terrain. To process regions with larger watersheds, where perhaps hundreds of watersheds have to be delineated, processed and classified, the overall process need to be automated. GIS packages such as ESRI's ArcGIS offer a number of sophisticated tools that help regarding such analysis. Yet there are still gaps and pitfalls that need to be considered if the tools are combined into a geoprocessing model to automate the complete assessment workflow. These gaps include issues such as i) assigning stream order according to Strahler theory, ii) calculating the threshold value for the stream network extraction, and iii) determining the pour points for each of the nodes of the Strahler ordered stream network. In this study a complete automated workflow based on ArcGIS Model Builder using standard tools will be introduced and discussed. Some additional tools have been implemented to complete the overall workflow. These tools have been programmed using Python and Java in the context of ArcObjects. The workflow has been applied to digital data from the southwestern Sinai Peninsula, Egypt. An optimum threshold value has been selected to optimize drainage configuration by statistically comparing all of the extracted stream configuration results from DEM with the available reference data from topographic maps. The code has succeeded in estimating the correct ranking of specific stream orders in an automatic manner without additional manual steps. As a result, the code has proven to save time and efforts; hence it's considered a very useful tool for processing large catchment basins.
How credit scores can make a difference for your revenue cycle.

PubMed

Jackson, Garett

2008-02-01

Questions an organization should consider before implementing credit scoring include: What action will be taken once a score is obtained? Should the score impact the workflow? Are the appropriate tools available to alter the workflow? How will accounts receivable staff be redirected or allocated? What is the risk tolerance for scoring mistakes?
Analyzing data flows of WLCG jobs at batch job level

NASA Astrophysics Data System (ADS)

Kuehn, Eileen; Fischer, Max; Giffels, Manuel; Jung, Christopher; Petzold, Andreas

2015-05-01

With the introduction of federated data access to the workflows of WLCG, it is becoming increasingly important for data centers to understand specific data flows regarding storage element accesses, firewall configurations, as well as the scheduling of batch jobs themselves. As existing batch system monitoring and related system monitoring tools do not support measurements at batch job level, a new tool has been developed and put into operation at the GridKa Tier 1 center for monitoring continuous data streams and characteristics of WLCG jobs and pilots. Long term measurements and data collection are in progress. These measurements already have been proven to be useful analyzing misbehaviors and various issues. Therefore we aim for an automated, realtime approach for anomaly detection. As a requirement, prototypes for standard workflows have to be examined. Based on measurements of several months, different features of HEP jobs are evaluated regarding their effectiveness for data mining approaches to identify these common workflows. The paper will introduce the actual measurement approach and statistics as well as the general concept and first results classifying different HEP job workflows derived from the measurements at GridKa.
Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME

PubMed Central

Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.

2015-01-01

Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Scientific workflows as productivity tools for drug discovery.

PubMed

Shon, John; Ohkawa, Hitomi; Hammer, Juergen

2008-05-01

Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.
A workflow learning model to improve geovisual analytics utility

PubMed Central

Roth, Robert E; MacEachren, Alan M; McCabe, Craig A

2011-01-01

Introduction This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. Objectives The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. Methodology The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545
A workflow learning model to improve geovisual analytics utility.

PubMed

Roth, Robert E; Maceachren, Alan M; McCabe, Craig A

2009-01-01

INTRODUCTION: This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. OBJECTIVES: The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. METHODOLOGY: The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
A web-based rapid assessment tool for production publishing solutions

NASA Astrophysics Data System (ADS)

Sun, Tong

2010-02-01

Solution assessment is a critical first-step in understanding and measuring the business process efficiency enabled by an integrated solution package. However, assessing the effectiveness of any solution is usually a very expensive and timeconsuming task which involves lots of domain knowledge, collecting and understanding the specific customer operational context, defining validation scenarios and estimating the expected performance and operational cost. This paper presents an intelligent web-based tool that can rapidly assess any given solution package for production publishing workflows via a simulation engine and create a report for various estimated performance metrics (e.g. throughput, turnaround time, resource utilization) and operational cost. By integrating the digital publishing workflow ontology and an activity based costing model with a Petri-net based workflow simulation engine, this web-based tool allows users to quickly evaluate any potential digital publishing solutions side-by-side within their desired operational contexts, and provides a low-cost and rapid assessment for organizations before committing any purchase. This tool also benefits the solution providers to shorten the sales cycles, establishing a trustworthy customer relationship and supplement the professional assessment services with a proven quantitative simulation and estimation technology.
The Perfect Neuroimaging-Genetics-Computation Storm: Collision of Petabytes of Data, Millions of Hardware Devices and Thousands of Software Tools

PubMed Central

Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Zamanyan, Alen; Torri, Federica; Macciardi, Fabio; Hobel, Sam; Moon, Seok Woo; Sung, Young Hee; Jiang, Zhiguo; Labus, Jennifer; Kurth, Florian; Ashe-McNalley, Cody; Mayer, Emeran; Vespa, Paul M.; Van Horn, John D.; Toga, Arthur W.

2013-01-01

The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data. PMID:23975276
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

PubMed Central

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

2015-01-01

Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

PubMed

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

2015-05-01

The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
CamBAfx: Workflow Design, Implementation and Application for Neuroimaging

PubMed Central

Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John

2009-01-01

CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470
Contextual cloud-based service oriented architecture for clinical workflow.

PubMed

Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos

2015-01-01

Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.
Git Replacement for the

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robinson, P.

2014-09-23

GRAPE is a tool for managing software project workflows for the Git version control system. It provides a suite of tools to simplify and configure branch based development, integration with a project's testing suite, and integration with the Atlassian Stash repository hosting tool.
Cytoscape: the network visualization tool for GenomeSpace workflows.

PubMed

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows

PubMed Central

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537

Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station.

PubMed

Rettig, Trisha A; Ward, Claire; Pecaut, Michael J; Chapes, Stephen K

2017-07-01

Spaceflight is known to affect immune cell populations. In particular, splenic B cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after space flight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. This included assessments of our bioinformatic workflow on Illumina HiSeq and MiSeq datasets and is specifically designed to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options. We validated our workflow by comparing our normal mouse MiSeq data to existing murine antibody repertoire studies validating it for future antibody repertoire studies.
From Provenance Standards and Tools to Queries and Actionable Provenance

NASA Astrophysics Data System (ADS)

Ludaescher, B.

2017-12-01

The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
Developing a Taxonomy of Characteristics and Features of Collaboration Tools for Teams in Distributed Environments

DTIC Science & Technology

2007-09-01

Motion URL: http://www.blackberry.com/products/blackberry/index.shtml Software Name: Bricolage Company: Bricolage URL: http://www.bricolage.cc...Workflow Customizable control over editorial content. Bricolage Bricolage Feature Description Software Company Workflow Allows development...content for Nuxeo Collaborative Portal projects. Nuxeo Workspace Add, edit, delete, content through web interface. Bricolage Bricolage
New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

NASA Astrophysics Data System (ADS)

Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

2007-12-01

High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.
KDE Bioscience: platform for bioinformatics analysis workflows.

PubMed

Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

2006-08-01

Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Hongyi; Li, Yang; Zeng, Danielle

Process integration and optimization is the key enabler of the Integrated Computational Materials Engineering (ICME) of carbon fiber composites. In this paper, automated workflows are developed for two types of composites: Sheet Molding Compounds (SMC) short fiber composites, and multi-layer unidirectional (UD) composites. For SMC, the proposed workflow integrates material processing simulation, microstructure representation volume element (RVE) models, material property prediction and structure preformation simulation to enable multiscale, multidisciplinary analysis and design. Processing parameters, microstructure parameters and vehicle subframe geometry parameters are defined as the design variables; the stiffness and weight of the structure are defined as the responses. Formore » multi-layer UD structure, this work focuses on the discussion of different design representation methods and their impacts on the optimization performance. Challenges in ICME process integration and optimization are also summarized and highlighted. Two case studies are conducted to demonstrate the integrated process and its application in optimization.« less
Teach-Discover-Treat (TDT): Collaborative Computational Drug Discovery for Neglected Diseases

PubMed Central

Jansen, Johanna M.; Cornell, Wendy; Tseng, Y. Jane; Amaro, Rommie E.

2012-01-01

Teach – Discover – Treat (TDT) is an initiative to promote the development and sharing of computational tools solicited through a competition with the aim to impact education and collaborative drug discovery for neglected diseases. Collaboration, multidisciplinary integration, and innovation are essential for successful drug discovery. This requires a workforce that is trained in state-of-the-art workflows and equipped with the ability to collaborate on platforms that are accessible and free. The TDT competition solicits high quality computational workflows for neglected disease targets, using freely available, open access tools. PMID:23085175
Attribute classification for generating GPR facies models

NASA Astrophysics Data System (ADS)

Tronicke, Jens; Allroggen, Niklas

2017-04-01

Ground-penetrating radar (GPR) is an established geophysical tool to explore near-surface sedimentary environments. It has been successfully used, for example, to reconstruct past depositional environments, to investigate sedimentary processes, to aid hydrogeological investigations, and to assist in hydrocarbon reservoir analog studies. Interpreting such 2D/3D GPR data, usually relies on concepts known as GPR facies analysis, in which GPR facies are defined as units composed of characteristic reflection patterns (in terms of reflection amplitude, continuity, geometry, and internal configuration). The resulting facies models are then interpreted in terms of depositional processes, sedimentary environments, litho-, and hydrofacies. Typically, such GPR facies analyses are implemented in a manual workflow being laborious and rather inefficient especially for 3D data sets. In addition, such a subjective strategy bears the potential of inconsistency because the outcome depends on the expertise and experience of the interpreter. In this presentation, we investigate the feasibility of delineating GPR facies in an objective and largely automated manner. Our proposed workflow relies on a three-step procedure. First, we calculate a variety of geometrical and physical attributes from processed 2D and 3D GPR data sets. Then, we analyze and evaluate this attribute data base (e.g., using statistical tools such as principal component analysis) to reduce its dimensionality and to avoid redundant information, respectively. Finally, we integrate the reduced data base using tools such as composite imaging, cluster analysis, and neural networks. Using field examples that have been acquired across different depositional environments, we demonstrate that the resulting 2D/3D facies models ease and improve the interpretation of GPR data. We conclude that our interpretation strategy allows to generate GPR facies models in a consistent and largely automated manner and might be helpful in variety near-surface applications.
Conceptual-level workflow modeling of scientific experiments using NMR as a case study

PubMed Central

Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R

2007-01-01

Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment. PMID:17263870
Conceptual-level workflow modeling of scientific experiments using NMR as a case study.

PubMed

Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R

2007-01-30

Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
A scientific workflow framework for (13)C metabolic flux analysis.

PubMed

Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

2016-08-20

Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.

PubMed

Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W

2018-04-12

RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.
Tools for monitoring system suitability in LC MS/MS centric proteomic experiments.

PubMed

Bereman, Michael S

2015-03-01

With advances in liquid chromatography coupled to tandem mass spectrometry technologies combined with the continued goals of biomarker discovery, clinical applications of established biomarkers, and integrating large multiomic datasets (i.e. "big data"), there remains an urgent need for robust tools to assess instrument performance (i.e. system suitability) in proteomic workflows. To this end, several freely available tools have been introduced that monitor a number of peptide identification (ID) and/or peptide ID free metrics. Peptide ID metrics include numbers of proteins, peptides, or peptide spectral matches identified from a complex mixture. Peptide ID free metrics include retention time reproducibility, full width half maximum, ion injection times, and integrated peptide intensities. The main driving force in the development of these tools is to monitor both intra- and interexperiment performance variability and to identify sources of variation. The purpose of this review is to summarize and evaluate these tools based on versatility, automation, vendor neutrality, metrics monitored, and visualization capabilities. In addition, the implementation of a robust system suitability workflow is discussed in terms of metrics, type of standard, and frequency of evaluation along with the obstacles to overcome prior to incorporating a more proactive approach to overall quality control in liquid chromatography coupled to tandem mass spectrometry based proteomic workflows. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Metabolomic spectral libraries for data-independent SWATH liquid chromatography mass spectrometry acquisition.

PubMed

Bruderer, Tobias; Varesio, Emmanuel; Hidasi, Anita O; Duchoslav, Eva; Burton, Lyle; Bonner, Ron; Hopfgartner, Gérard

2018-03-01

High-quality mass spectral libraries have become crucial in mass spectrometry-based metabolomics. Here, we investigate a workflow to generate accurate mass discrete and composite spectral libraries for metabolite identification and for SWATH mass spectrometry data processing. Discrete collision energy (5-100 eV) accurate mass spectra were collected for 532 metabolites from the human metabolome database (HMDB) by flow injection analysis and compiled into composite spectra over a large collision energy range (e.g., 10-70 eV). Full scan response factors were also calculated. Software tools based on accurate mass and predictive fragmentation were specially developed and found to be essential for construction and quality control of the spectral library. First, elemental compositions constrained by the elemental composition of the precursor ion were calculated for all fragments. Secondly, all possible fragments were generated from the compound structure and were filtered based on their elemental compositions. From the discrete spectra, it was possible to analyze the specific fragment form at each collision energy and it was found that a relatively large collision energy range (10-70 eV) gives informative MS/MS spectra for library searches. From the composite spectra, it was possible to characterize specific neutral losses as radical losses using in silico fragmentation. Radical losses (generating radical cations) were found to be more prominent than expected. From 532 metabolites, 489 provided a signal in positive mode [M+H] + and 483 in negative mode [M-H] - . MS/MS spectra were obtained for 399 compounds in positive mode and for 462 in negative mode; 329 metabolites generated suitable spectra in both modes. Using the spectral library, LC retention time, response factors to analyze data-independent LC-SWATH-MS data allowed the identification of 39 (positive mode) and 72 (negative mode) metabolites in a plasma pool sample (total 92 metabolites) where 81 previously were reported in HMDB to be found in plasma. Graphical abstract Library generation workflow for LC-SWATH MS, using collision energy spread, accurate mass, and fragment annotation.
SECIMTools: a suite of metabolomics data analysis tools.

PubMed

Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M

2018-04-20

Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
Process Integration and Optimization of ICME Carbon Fiber Composites for Vehicle Lightweighting: A Preliminary Development

DOE PAGES

Xu, Hongyi; Li, Yang; Zeng, Danielle

2017-01-02

Process integration and optimization is the key enabler of the Integrated Computational Materials Engineering (ICME) of carbon fiber composites. In this paper, automated workflows are developed for two types of composites: Sheet Molding Compounds (SMC) short fiber composites, and multi-layer unidirectional (UD) composites. For SMC, the proposed workflow integrates material processing simulation, microstructure representation volume element (RVE) models, material property prediction and structure preformation simulation to enable multiscale, multidisciplinary analysis and design. Processing parameters, microstructure parameters and vehicle subframe geometry parameters are defined as the design variables; the stiffness and weight of the structure are defined as the responses. Formore » multi-layer UD structure, this work focuses on the discussion of different design representation methods and their impacts on the optimization performance. Challenges in ICME process integration and optimization are also summarized and highlighted. Two case studies are conducted to demonstrate the integrated process and its application in optimization.« less
DIaaS: Data-Intensive workflows as a service - Enabling easy composition and deployment of data-intensive workflows on Virtual Research Environments

NASA Astrophysics Data System (ADS)

Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.

2016-12-01

We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
Structuring clinical workflows for diabetes care: an overview of the OntoHealth approach.

PubMed

Schweitzer, M; Lasierra, N; Oberbichler, S; Toma, I; Fensel, A; Hoerbst, A

2014-01-01

Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view.
Structuring Clinical Workflows for Diabetes Care

PubMed Central

Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.

2014-01-01

Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765
A Community-Driven Workflow Recommendations and Reuse Infrastructure

NASA Astrophysics Data System (ADS)

Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.

2013-12-01

Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from usage history to help Earth scientists better understand existing artifacts and how to use them in a proper manner? R2: Informed by insights derived from their computing contexts, how could such hidden knowledge be used to facilitate artifact reuse by Earth scientists? Our study of the two research questions will provide answers to three technical questions aiming to assist NEX users during workflow development: 1) How to determine what topics interest the researcher? 2) How to find appropriate artifacts? and 3) How to advise the researcher in artifact reuse? In this paper, we report our on-going efforts of leveraging social networking theory and analysis techniques to provide dynamic advice on artifact reuse to NEX users based on their surrounding contexts. As a proof of concept, we have designed and developed a plug-in to the VisTrails workflow design tool. When users develop workflows using VisTrails, our plug-in will proactively recommend most relevant sub-workflows to the users.

A comprehensive quality control workflow for paired tumor-normal NGS experiments.

PubMed

Schroeder, Christopher M; Hilke, Franz J; Löffler, Markus W; Bitzer, Michael; Lenz, Florian; Sturm, Marc

2017-06-01

Quality control (QC) is an important part of all NGS data analysis stages. Many available tools calculate QC metrics from different analysis steps of single sample experiments (raw reads, mapped reads and variant lists). Multi-sample experiments, as sequencing of tumor-normal pairs, require additional QC metrics to ensure validity of results. These multi-sample QC metrics still lack standardization. We therefore suggest a new workflow for QC of DNA sequencing of tumor-normal pairs. With this workflow well-known single-sample QC metrics and additional metrics specific for tumor-normal pairs can be calculated. The segmentation into different tools offers a high flexibility and allows reuse for other purposes. All tools produce qcML, a generic XML format for QC of -omics experiments. qcML uses quality metrics defined in an ontology, which was adapted for NGS. All QC tools are implemented in C ++ and run both under Linux and Windows. Plotting requires python 2.7 and matplotlib. The software is available under the 'GNU General Public License version 2' as part of the ngs-bits project: https://github.com/imgag/ngs-bits. christopher.schroeder@med.uni-tuebingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Optimizing high performance computing workflow for protein functional annotation.

PubMed

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-09-10

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation

PubMed Central

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-01-01

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Building an efficient curation workflow for the Arabidopsis literature corpus

PubMed Central

Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva

2012-01-01

TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
ballaxy: web services for structural bioinformatics.

PubMed

Hildebrandt, Anna Katharina; Stöckel, Daniel; Fischer, Nina M; de la Garza, Luis; Krüger, Jens; Nickels, Stefan; Röttig, Marc; Schärfe, Charlotta; Schumann, Marcel; Thiel, Philipp; Lenhof, Hans-Peter; Kohlbacher, Oliver; Hildebrandt, Andreas

2015-01-01

Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling. ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CoSEC: Connecting Living With a Star Research

NASA Astrophysics Data System (ADS)

Hurlburt, N.; Freeland, S.; Bose, P.; Zimdars, A.; Slater, G.

2006-12-01

The Collaborative Sun-Earth Connector (CoSEC) provide the means for heliophysics researchers to compose the data sources and processing services published by their peers into processing workflows that reliably generate publication-worthy data. It includes: composition of computational and data services into easy-to- read workflows with data quality and version traceability; straightforward translation of existing services into workflow components, and advertisement of those components to other members of the CoSEC community; annotation of published services with functional attributes to enable discovery of capabilities required by particular workflows and identify peer subgroups in the CoSEC community; and annotation of published services with nonfunctional attributes to enable selection on the basis of quality of service (QoS). We present an overview and demonstration of the CoSEC system, discuss applications, the lessons learned and future developments.
MyGeoHub: A Collaborative Geospatial Research and Education Platform

NASA Astrophysics Data System (ADS)

Kalyanam, R.; Zhao, L.; Biehl, L. L.; Song, C. X.; Merwade, V.; Villoria, N.

2017-12-01

Scientific research is increasingly collaborative and globally distributed; research groups now rely on web-based scientific tools and data management systems to simplify their day-to-day collaborative workflows. However, such tools often lack seamless interfaces, requiring researchers to contend with manual data transfers, annotation and sharing. MyGeoHub is a web platform that supports out-of-the-box, seamless workflows involving data ingestion, metadata extraction, analysis, sharing and publication. MyGeoHub is built on the HUBzero cyberinfrastructure platform and adds general-purpose software building blocks (GABBs), for geospatial data management, visualization and analysis. A data management building block iData, processes geospatial files, extracting metadata for keyword and map-based search while enabling quick previews. iData is pervasive, allowing access through a web interface, scientific tools on MyGeoHub or even mobile field devices via a data service API. GABBs includes a Python map library as well as map widgets that in a few lines of code, generate complete geospatial visualization web interfaces for scientific tools. GABBs also includes powerful tools that can be used with no programming effort. The GeoBuilder tool provides an intuitive wizard for importing multi-variable, geo-located time series data (typical of sensor readings, GPS trackers) to build visualizations supporting data filtering and plotting. MyGeoHub has been used in tutorials at scientific conferences and educational activities for K-12 students. MyGeoHub is also constantly evolving; the recent addition of Jupyter and R Shiny notebook environments enable reproducible, richly interactive geospatial analyses and applications ranging from simple pre-processing to published tools. MyGeoHub is not a monolithic geospatial science gateway, instead it supports diverse needs ranging from just a feature-rich data management system, to complex scientific tools and workflows.
A Drupal-Based Collaborative Framework for Science Workflows

NASA Astrophysics Data System (ADS)

Pinheiro da Silva, P.; Gandara, A.

2010-12-01

Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-11-28

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-03-01

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Automation of Educational Tasks for Academic Radiology.

PubMed

Lamar, David L; Richardson, Michael L; Carlson, Blake

2016-07-01

The process of education involves a variety of repetitious tasks. We believe that appropriate computer tools can automate many of these chores, and allow both educators and their students to devote a lot more of their time to actual teaching and learning. This paper details tools that we have used to automate a broad range of academic radiology-specific tasks on Mac OS X, iOS, and Windows platforms. Some of the tools we describe here require little expertise or time to use; others require some basic knowledge of computer programming. We used TextExpander (Mac, iOS) and AutoHotKey (Win) for automated generation of text files, such as resident performance reviews and radiology interpretations. Custom statistical calculations were performed using TextExpander and the Python programming language. A workflow for automated note-taking was developed using Evernote (Mac, iOS, Win) and Hazel (Mac). Automated resident procedure logging was accomplished using Editorial (iOS) and Python. We created three variants of a teaching session logger using Drafts (iOS) and Pythonista (iOS). Editorial and Drafts were used to create flashcards for knowledge review. We developed a mobile reference management system for iOS using Editorial. We used the Workflow app (iOS) to automatically generate a text message reminder for daily conferences. Finally, we developed two separate automated workflows-one with Evernote (Mac, iOS, Win) and one with Python (Mac, Win)-that generate simple automated teaching file collections. We have beta-tested these workflows, techniques, and scripts on several of our fellow radiologists. All of them expressed enthusiasm for these tools and were able to use one or more of them to automate their own educational activities. Appropriate computer tools can automate many educational tasks, and thereby allow both educators and their students to devote a lot more of their time to actual teaching and learning. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow

DOE PAGES

Brunk, Elizabeth; George, Kevin W.; Alonso-Gutierrez, Jorge; ...

2016-05-19

Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proofmore » of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.« less
Barriers to critical thinking: workflow interruptions and task switching among nurses.

PubMed

Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin

2011-10-01

Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.
Advantages and Disadvantages of 1-Incision, 2-Incision, 3-Incision, and 4-Incision Laparoscopic Cholecystectomy: A Workflow Comparison Study.

PubMed

Bartnicka, Joanna; Zietkiewicz, Agnieszka A; Kowalski, Grzegorz J

2016-08-01

A comparison of 1-port, 2-port, 3-port, and 4-port laparoscopic cholecystectomy techniques from the point of view of workflow criteria was made to both identify specific workflow components that can cause surgical disturbances and indicate good and bad practices. As a case study, laparoscopic cholecystectomies, including manual tasks and interactions within teamwork members, were video-recorded and analyzed on the basis of specially encoded workflow information. The parameters for comparison were defined as follows: surgery time, tool and hand activeness, operator's passive work, collisions, and operator interventions. It was found that 1-port cholecystectomy is the worst technique because of nonergonomic body position, technical complexity, organizational anomalies, and operational dynamism. The differences between laparoscopic techniques are closely linked to the costs of the medical procedures. Hence, knowledge about the surgical workflow can be used for both planning surgical procedures and balancing the expenses associated with surgery.
An integrated workflow for analysis of ChIP-chip data.

PubMed

Weigelt, Karin; Moehle, Christoph; Stempfl, Thomas; Weber, Bernhard; Langmann, Thomas

2008-08-01

Although ChIP-chip is a powerful tool for genome-wide discovery of transcription factor target genes, the steps involving raw data analysis, identification of promoters, and correlation with binding sites are still laborious processes. Therefore, we report an integrated workflow for the analysis of promoter tiling arrays with the Genomatix ChipInspector system. We compare this tool with open-source software packages to identify PU.1 regulated genes in mouse macrophages. Our results suggest that ChipInspector data analysis, comparative genomics for binding site prediction, and pathway/network modeling significantly facilitate and enhance whole-genome promoter profiling to reveal in vivo sites of transcription factor-DNA interactions.
The Development of a High-Throughput/Combinatorial Workflow for the Study of Porous Polymer Networks

DTIC Science & Technology

2012-04-05

poragen composition , poragen level, and cure temperature. A total of 216 unique compositions were prepared. Changes in opacity of the blends as they cured...allowed for the identification of compositional variables and process variables that enabled the production of porous networks. Keywords: high...in polymer network cross-link density,poragen composition , poragen level, and cure temperature. A total of 216 unique compositions were prepared
I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chard, Kyle; D'Arcy, Mike; Heavner, Benjamin D.

Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and toolsmore » for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.« less
Big data analytics in immunology: a knowledge-based approach.

PubMed

Zhang, Guang Lan; Sun, Jing; Chitkushev, Lou; Brusic, Vladimir

2014-01-01

With the vast amount of immunological data available, immunology research is entering the big data era. These data vary in granularity, quality, and complexity and are stored in various formats, including publications, technical reports, and databases. The challenge is to make the transition from data to actionable knowledge and wisdom and bridge the knowledge gap and application gap. We report a knowledge-based approach based on a framework called KB-builder that facilitates data mining by enabling fast development and deployment of web-accessible immunological data knowledge warehouses. Immunological knowledge discovery relies heavily on both the availability of accurate, up-to-date, and well-organized data and the proper analytics tools. We propose the use of knowledge-based approaches by developing knowledgebases combining well-annotated data with specialized analytical tools and integrating them into analytical workflow. A set of well-defined workflow types with rich summarization and visualization capacity facilitates the transformation from data to critical information and knowledge. By using KB-builder, we enabled streamlining of normally time-consuming processes of database development. The knowledgebases built using KB-builder will speed up rational vaccine design by providing accurate and well-annotated data coupled with tailored computational analysis tools and workflow.
A simple tool for stereological assessment of digital images: the STEPanizer.

PubMed

Tschanz, S A; Burri, P H; Weibel, E R

2011-07-01

STEPanizer is an easy-to-use computer-based software tool for the stereological assessment of digitally captured images from all kinds of microscopical (LM, TEM, LSM) and macroscopical (radiology, tomography) imaging modalities. The program design focuses on providing the user a defined workflow adapted to most basic stereological tasks. The software is compact, that is user friendly without being bulky. STEPanizer comprises the creation of test systems, the appropriate display of digital images with superimposed test systems, a scaling facility, a counting module and an export function for the transfer of results to spreadsheet programs. Here we describe the major workflow of the tool illustrating the application on two examples from transmission electron microscopy and light microscopy, respectively. © 2011 The Authors Journal of Microscopy © 2011 Royal Microscopical Society.
The standard-based open workflow system in GeoBrain (Invited)

NASA Astrophysics Data System (ADS)

Di, L.; Yu, G.; Zhao, P.; Deng, M.

2013-12-01

GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial intelligence technology. The GeoBrain workflow system has been used in multiple Earth science applications, including the monitoring of global agricultural drought, the assessment of flood damage, the derivation of national crop condition and progress information, and the detection of nuclear proliferation facilities and events.

Designing Health Information Technology Tools to Prevent Gaps in Public Health Insurance.

PubMed

Hall, Jennifer D; Harding, Rose L; DeVoe, Jennifer E; Gold, Rachel; Angier, Heather; Sumic, Aleksandra; Nelson, Christine A; Likumahuwa-Ackman, Sonja; Cohen, Deborah J

2017-06-23

Changes in health insurance policies have increased coverage opportunities, but enrollees are required to annually reapply for benefits which, if not managed appropriately, can lead to insurance gaps. Electronic health records (EHRs) can automate processes for assisting patients with health insurance enrollment and re-enrollment. We describe community health centers' (CHC) workflow, documentation, and tracking needs for assisting families with insurance application processes, and the health information technology (IT) tool components that were developed to meet those needs. We conducted a qualitative study using semi-structured interviews and observation of clinic operations and insurance application assistance processes. Data were analyzed using a grounded theory approach. We diagramed workflows and shared information with a team of developers who built the EHR-based tools. Four steps to the insurance assistance workflow were common among CHCs: 1) Identifying patients for public health insurance application assistance; 2) Completing and submitting the public health insurance application when clinic staff met with patients to collect requisite information and helped them apply for benefits; 3) Tracking public health insurance approval to monitor for decisions; and 4) assisting with annual health insurance reapplication. We developed EHR-based tools to support clinical staff with each of these steps. CHCs are uniquely positioned to help patients and families with public health insurance applications. CHCs have invested in staff to assist patients with insurance applications and help prevent coverage gaps. To best assist patients and to foster efficiency, EHR based insurance tools need comprehensive, timely, and accurate health insurance information.
Realising the Uncertainty Enabled Model Web

NASA Astrophysics Data System (ADS)

Cornford, D.; Bastin, L.; Pebesma, E. J.; Williams, M.; Stasch, C.; Jones, R.; Gerharz, L.

2012-12-01

The FP7 funded UncertWeb project aims to create the "uncertainty enabled model web". The central concept here is that geospatial models and data resources are exposed via standard web service interfaces, such as the Open Geospatial Consortium (OGC) suite of encodings and interface standards, allowing the creation of complex workflows combining both data and models. The focus of UncertWeb is on the issue of managing uncertainty in such workflows, and providing the standards, architecture, tools and software support necessary to realise the "uncertainty enabled model web". In this paper we summarise the developments in the first two years of UncertWeb, illustrating several key points with examples taken from the use case requirements that motivate the project. Firstly we address the issue of encoding specifications. We explain the usage of UncertML 2.0, a flexible encoding for representing uncertainty based on a probabilistic approach. This is designed to be used within existing standards such as Observations and Measurements (O&M) and data quality elements of ISO19115 / 19139 (geographic information metadata and encoding specifications) as well as more broadly outside the OGC domain. We show profiles of O&M that have been developed within UncertWeb and how UncertML 2.0 is used within these. We also show encodings based on NetCDF and discuss possible future directions for encodings in JSON. We then discuss the issues of workflow construction, considering discovery of resources (both data and models). We discuss why a brokering approach to service composition is necessary in a world where the web service interfaces remain relatively heterogeneous, including many non-OGC approaches, in particular the more mainstream SOAP and WSDL approaches. We discuss the trade-offs between delegating uncertainty management functions to the service interfaces themselves and integrating the functions in the workflow management system. We describe two utility services to address conversion between uncertainty types, and between the spatial / temporal support of service inputs / outputs. Finally we describe the tools being generated within the UncertWeb project, considering three main aspects: i) Elicitation of uncertainties on model inputs. We are developing tools to enable domain experts to provide judgements about input uncertainties from UncertWeb model components (e.g. parameters in meteorological models) which allow panels of experts to engage in the process and reach a consensus view on the current knowledge / beliefs about that parameter or variable. We are developing systems for continuous and categorical variables as well as stationary spatial fields. ii) Visualisation of the resulting uncertain outputs from the end of the workflow, but also at intermediate steps. At this point we have prototype implementations driven by the requirements from the use cases that motivate UncertWeb. iii) Sensitivity and uncertainty analysis on model outputs. Here we show the design of the overall system we are developing, including the deployment of an emulator framework to allow computationally efficient approaches. We conclude with a summary of the open issues and remaining challenges we are facing in UncertWeb, and provide a brief overview of how we plan to tackle these.
A workflow to process 3D+time microscopy images of developing organisms and reconstruct their cell lineage

PubMed Central

Faure, Emmanuel; Savy, Thierry; Rizzi, Barbara; Melani, Camilo; Stašová, Olga; Fabrèges, Dimitri; Špir, Róbert; Hammons, Mark; Čúnderlík, Róbert; Recher, Gaëlle; Lombardot, Benoît; Duloquin, Louise; Colin, Ingrid; Kollár, Jozef; Desnoulez, Sophie; Affaticati, Pierre; Maury, Benoît; Boyreau, Adeline; Nief, Jean-Yves; Calvat, Pascal; Vernier, Philippe; Frain, Monique; Lutfalla, Georges; Kergosien, Yannick; Suret, Pierre; Remešíková, Mariana; Doursat, René; Sarti, Alessandro; Mikula, Karol; Peyriéras, Nadine; Bourgine, Paul

2016-01-01

The quantitative and systematic analysis of embryonic cell dynamics from in vivo 3D+time image data sets is a major challenge at the forefront of developmental biology. Despite recent breakthroughs in the microscopy imaging of living systems, producing an accurate cell lineage tree for any developing organism remains a difficult task. We present here the BioEmergences workflow integrating all reconstruction steps from image acquisition and processing to the interactive visualization of reconstructed data. Original mathematical methods and algorithms underlie image filtering, nucleus centre detection, nucleus and membrane segmentation, and cell tracking. They are demonstrated on zebrafish, ascidian and sea urchin embryos with stained nuclei and membranes. Subsequent validation and annotations are carried out using Mov-IT, a custom-made graphical interface. Compared with eight other software tools, our workflow achieved the best lineage score. Delivered in standalone or web service mode, BioEmergences and Mov-IT offer a unique set of tools for in silico experimental embryology. PMID:26912388
Automated metadata--final project report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David

This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

PubMed Central

Vernick, Kenneth D.

2017-01-01

Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
Seamless online science workflow development and collaboration using IDL and the ENVI Services Engine

NASA Astrophysics Data System (ADS)

Harris, A. T.; Ramachandran, R.; Maskey, M.

2013-12-01

The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL Workbench, coupled with ESE for execution in the cloud, asynchronous workflows could be executed in batch mode on large data in the cloud. We envision that a scientist will initially develop a scientific workflow locally on a small set of data. Once tested, the scientist will deploy the workflow to the cloud for execution. Depending on the results, the scientist may share the workflow and results, allowing them to be stored in a community catalog and instantly loaded into the IDL Workbench of other scientists. Thereupon, scientists can clone and modify or execute the workflow with different input parameters. The Collaborative Workbench will provide a platform for collaboration in the cloud, helping Earth scientists solve big-data problems in the Earth and planetary sciences.
Scientific Workflows + Provenance = Better (Meta-)Data Management

NASA Astrophysics Data System (ADS)

Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

2013-12-01

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
A strategy for systemic toxicity assessment based on non-animal approaches: The Cosmetics Europe Long Range Science Strategy programme.

PubMed

Desprez, Bertrand; Dent, Matt; Keller, Detlef; Klaric, Martina; Ouédraogo, Gladys; Cubberley, Richard; Duplan, Hélène; Eilstein, Joan; Ellison, Corie; Grégoire, Sébastien; Hewitt, Nicola J; Jacques-Jamin, Carine; Lange, Daniela; Roe, Amy; Rothe, Helga; Blaauboer, Bas J; Schepky, Andreas; Mahony, Catherine

2018-08-01

When performing safety assessment of chemicals, the evaluation of their systemic toxicity based only on non-animal approaches is a challenging objective. The Safety Evaluation Ultimately Replacing Animal Test programme (SEURAT-1) addressed this question from 2011 to 2015 and showed that further research and development of adequate tools in toxicokinetic and toxicodynamic are required for performing non-animal safety assessments. It also showed how to implement tools like thresholds of toxicological concern (TTCs) and read-across in this context. This paper shows a tiered scientific workflow and how each tier addresses the four steps of the risk assessment paradigm. Cosmetics Europe established its Long Range Science Strategy (LRSS) programme, running from 2016 to 2020, based on the outcomes of SEURAT-1 to implement this workflow. Dedicated specific projects address each step of this workflow, which is introduced here. It tackles the question of evaluating the internal dose when systemic exposure happens. The applicability of the workflow will be shown through a series of case studies, which will be published separately. Even if the LRSS puts the emphasis on safety assessment of cosmetic relevant chemicals, it remains applicable to any type of chemical. Copyright © 2018. Published by Elsevier Ltd.
Safety and feasibility of STAT RAD: Improvement of a novel rapid tomotherapy-based radiation therapy workflow by failure mode and effects analysis.

PubMed

Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan

2015-01-01

The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow. Copyright © 2015 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Deploying and sharing U-Compare workflows as web services.

PubMed

Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia

2013-02-18

U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services

PubMed Central

2013-01-01

Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

PubMed

Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

2005-04-07

Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Learning from data to design functional materials without inversion symmetry

PubMed Central

Balachandran, Prasanna V.; Young, Joshua; Lookman, Turab; Rondinelli, James M.

2017-01-01

Accelerating the search for functional materials is a challenging problem. Here we develop an informatics-guided ab initio approach to accelerate the design and discovery of noncentrosymmetric materials. The workflow integrates group theory, informatics and density-functional theory to uncover design guidelines for predicting noncentrosymmetric compounds, which we apply to layered Ruddlesden-Popper oxides. Group theory identifies how configurations of oxygen octahedral rotation patterns, ordered cation arrangements and their interplay break inversion symmetry, while informatics tools learn from available data to select candidate compositions that fulfil the group-theoretical postulates. Our key outcome is the identification of 242 compositions after screening ∼3,200 that show potential for noncentrosymmetric structures, a 25-fold increase in the projected number of known noncentrosymmetric Ruddlesden-Popper oxides. We validate our predictions for 19 compounds using phonon calculations, among which 17 have noncentrosymmetric ground states including two potential multiferroics. Our approach enables rational design of materials with targeted crystal symmetries and functionalities. PMID:28211456
Forecasting production in Liquid Rich Shale plays

NASA Astrophysics Data System (ADS)

Nikfarman, Hanieh

Production from Liquid Rich Shale (LRS) reservoirs is taking center stage in the exploration and production of unconventional reservoirs. Production from the low and ultra-low permeability LRS plays is possible only through multi-fractured horizontal wells (MFHW's). There is no existing workflow that is applicable to forecasting multi-phase production from MFHW's in LRS plays. This project presents a practical and rigorous workflow for forecasting multiphase production from MFHW's in LRS reservoirs. There has been much effort in developing workflows and methodology for forecasting in tight/shale plays in recent years. The existing workflows, however, are applicable only to single phase flow, and are primarily used in shale gas plays. These methodologies do not apply to the multi-phase flow that is inevitable in LRS plays. To account for complexities of multiphase flow in MFHW's the only available technique is dynamic modeling in compositional numerical simulators. These are time consuming and not practical when it comes to forecasting production and estimating reserves for a large number of producers. A workflow was developed, and validated by compositional numerical simulation. The workflow honors physics of flow, and is sufficiently accurate while practical so that an analyst can readily apply it to forecast production and estimate reserves in a large number of producers in a short period of time. To simplify the complex multiphase flow in MFHW, the workflow divides production periods into an initial period where large production and pressure declines are expected, and the subsequent period where production decline may converge into a common trend for a number of producers across an area of interest in the field. Initial period assumes the production is dominated by single-phase flow of oil and uses the tri-linear flow model of Erdal Ozkan to estimate the production history. Commercial software readily available can simulate flow and forecast production in this period. In the subsequent Period, dimensionless rate and dimensionless time functions are introduced that help identify transition from initial period into subsequent period. The production trends in terms of the dimensionless parameters converge for a range of rock permeability and stimulation intensity. This helps forecast production beyond transition to the end of life of well. This workflow is applicable to single fluid system.
Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment

PubMed Central

Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan

2016-01-01

Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058
Auspice: Automatic Service Planning in Cloud/Grid Environments

NASA Astrophysics Data System (ADS)

Chiu, David; Agrawal, Gagan

Recent scientific advances have fostered a mounting number of services and data sets available for utilization. These resources, though scattered across disparate locations, are often loosely coupled both semantically and operationally. This loosely coupled relationship implies the possibility of linking together operations and data sets to answer queries. This task, generally known as automatic service composition, therefore abstracts the process of complex scientific workflow planning from the user. We have been exploring a metadata-driven approach toward automatic service workflow composition, among other enabling mechanisms, in our system, Auspice: Automatic Service Planning in Cloud/Grid Environments. In this paper, we present a complete overview of our system's unique features and outlooks for future deployment as the Cloud computing paradigm becomes increasingly eminent in enabling scientific computing.
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

PubMed

Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

PubMed Central

Cheng, Gong; Zhang, Guocai; Xu, Liang

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Web-Accessible Scientific Workflow System for Performance Monitoring

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roelof Versteeg; Roelof Versteeg; Trevor Rowe

2006-03-01

We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
Lessons from implementing a combined workflow-informatics system for diabetes management.

PubMed

Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C

2008-01-01

Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.

Dynamic reusable workflows for ocean science

USGS Publications Warehouse

Signell, Richard; Fernandez, Filipe; Wilcox, Kyle

2016-01-01

Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.
Digital transformation in home care. A case study.

PubMed

Bennis, Sandy; Costanzo, Diane; Flynn, Ann Marie; Reidy, Agatha; Tronni, Catherine

2007-01-01

Simply implementing software and technology does not assure that an organization's targeted clinical and financial goals will be realized. No longer is it possible to roll out a new system--by solely providing end user training and overlaying it on top of already inefficient workflows and outdated roles--and know with certainty that targets will be met. At Virtua Health's Home Care, based in south New Jersey, implementation of their electronic system initially followed this more traditional approach. Unable to completely attain their earlier identified return on investment, they enlisted the help of a new role within their health system, that of the nurse informaticist. Knowledgeable in complex clinical processes and not bound by the technology at hand, the informaticist analyzed physical workflow, digital workflow, roles and physical layout. Leveraging specific tools such as change acceleration, workouts and LEAN, the informaticist was able to redesign workflow and support new levels of functionality. This article provides a view from the "finish line", recounting how this role worked with home care to assimilate information delivery into more efficient processes and align resources to support the new workflow, ultimately achieving real tangible returns.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data

NASA Astrophysics Data System (ADS)

Peterson, F. S.; Barbieri, L.; Wyngaard, J.

2015-12-01

Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
Global Adjoint Tomography: Next-Generation Models

NASA Astrophysics Data System (ADS)

Bozdag, Ebru; Lefebvre, Matthieu; Lei, Wenjie; Orsvuran, Ridvan; Peter, Daniel; Ruan, Youyi; Smith, James; Komatitsch, Dimitri; Tromp, Jeroen

2017-04-01

The first-generation global adjoint tomography model GLAD-M15 (Bozdag et al. 2016) is the result of 15 conjugate-gradient iterations based on GPU-accelerated spectral-element simulations of 3D wave propagation and Fréchet kernels. For simplicity, GLAD-M15 was constructed as an elastic model with transverse isotropy confined to the upper mantle. However, Earth's mantle and crust show significant evidence of anisotropy as a result of its composition and deformation. There may be different sources of seismic anisotropy affecting both body and surface waves. As a first attempt, we initially tackle with surface-wave anisotropy and proceed iterations using the same 253 earthquake data set used in GLAD-M15 with an emphasize on upper-mantle. Furthermore, we explore new misfits, such as double-difference measurements (Yuan et al. 2016), to better deal with the possible artifacts of the uneven distribution of seismic stations globally and minimize source uncertainties in structural inversions. We will present our observations with the initial results of azimuthally anisotropic inversions and also discuss the next generation global models with various parametrizations. Meanwhile our goal is to use all available seismic data in imaging. This however requires a solid framework to perform iterative adjoint tomography workflows with big data on supercomputers. We will talk about developments in adjoint tomography workflow from the need of defining new seismic and computational data formats (e.g., ASDF by Krischer et al. 2016, ADIOS by Liu et al. 2011) to developing new pre- and post-processing tools together with experimenting workflow management tools, such as Pegasus (Deelman et al. 2015). All our simulations are performed on Oak Ridge National Laboratory's Cray XK7 "Titan" system. Our ultimate aim is to get ready to harness ORNL's next-generation supercomputer "Summit", an IBM with Power-9 CPUs and NVIDIA Volta GPU accelerators, to be ready by 2018 which will enable us to reduce the shortest period in our global simulations from 17 s to 9 s, and exascale systems will reduce this further to just a few seconds.
Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)

NASA Astrophysics Data System (ADS)

Langenhorst, A. R.; Balaji, V.; Yakovlev, A.

2010-12-01

A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability to incorporate user-level functions. FRE post-processing can take us all the way to the preparing of graphical output for a scientific audience, and publication of data on a public portal. ● Recent FRE development includes incorporating a distributed workflow to support remote computing.
Updates in metabolomics tools and resources: 2014-2015.

PubMed

Misra, Biswapriya B; van der Hooft, Justin J J

2016-01-01

Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources--in the form of tools, software, and databases--is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Report Central: quality reporting tool in an electronic health record.

PubMed

Jung, Eunice; Li, Qi; Mangalampalli, Anil; Greim, Julie; Eskin, Michael S; Housman, Dan; Isikoff, Jeremy; Abend, Aaron H; Middleton, Blackford; Einbinder, Jonathan S

2006-01-01

Quality reporting tools, integrated with ambulatory electronic health records, can help clinicians and administrators understand performance, manage populations, and improve quality. Report Central is a secure web report delivery tool built on Crystal Reports XItrade mark and ASP.NET technologies. Pilot evaluation of Report Central indicates that clinicians prefer a quality reporting tool that is integrated with our home-grown EHR to support clinical workflow.
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Potential of knowledge discovery using workflows implemented in the C3Grid

NASA Astrophysics Data System (ADS)

Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny

2013-04-01

With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity alteration of surface cyclones. A specific feature of C3Grid is the flexible Workflow Scheduling Service (WSS) which also allows for automated nightly analysis runs of CT, Stormtrack, etc. with different input parameter sets. The statistical results of these workflows can be accumulated afterwards by a scheduled final analysis step, thereby providing a tool for data intensive analytics for the massive amounts of climate model data accessible through C3Grid. First tests with these automated analysis workflows show promising results to speed up the investigation of high volume modeling data. This example is relevant to the thorough analysis of future changes in storminess in Europe and is just one example of the potential of knowledge discovery using automated workflows implemented in the C3Grid architecture.
Improving diabetes population management efficiency with an informatics solution.

PubMed

Zai, Adrian; Grant, Richard; Andrews, Carl; Yee, Ronnie; Chueh, Henry

2007-10-11

Despite intensive resource use for diabetes management in the U.S., our care continues to fall short of evidence-based goals, partly due to system inefficiencies. Diabetes registries are increasingly being utilized as a critical tool for population level disease management by providing real-time data. Since the successful adoption of a diabetes registry depends on how well it integrates with disease management workflows, we optimized our current diabetes management workflow and designed our registry application around it.
BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication.

PubMed

Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin

2017-11-10

The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Critical care physician cognitive task analysis: an exploratory study

PubMed Central

Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter

2009-01-01

Introduction For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. Methods After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Results Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. Conclusions CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes. PMID:19265517
Critical care physician cognitive task analysis: an exploratory study.

PubMed

Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter

2009-01-01

For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes.
Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.

PubMed

Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell

2011-07-26

Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.
Initial steps towards a production platform for DNA sequence analysis on the grid.

PubMed

Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D

2010-12-14

Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/
LFQProfiler and RNP(xl): Open-Source Tools for Label-Free Quantification and Protein-RNA Cross-Linking Integrated into Proteome Discoverer.

PubMed

Veit, Johannes; Sachsenberg, Timo; Chernev, Aleksandar; Aicheler, Fabian; Urlaub, Henning; Kohlbacher, Oliver

2016-09-02

Modern mass spectrometry setups used in today's proteomics studies generate vast amounts of raw data, calling for highly efficient data processing and analysis tools. Software for analyzing these data is either monolithic (easy to use, but sometimes too rigid) or workflow-driven (easy to customize, but sometimes complex). Thermo Proteome Discoverer (PD) is a powerful software for workflow-driven data analysis in proteomics which, in our eyes, achieves a good trade-off between flexibility and usability. Here, we present two open-source plugins for PD providing additional functionality: LFQProfiler for label-free quantification of peptides and proteins, and RNP(xl) for UV-induced peptide-RNA cross-linking data analysis. LFQProfiler interacts with existing PD nodes for peptide identification and validation and takes care of the entire quantitative part of the workflow. We show that it performs at least on par with other state-of-the-art software solutions for label-free quantification in a recently published benchmark ( Ramus, C.; J. Proteomics 2016 , 132 , 51 - 62 ). The second workflow, RNP(xl), represents the first software solution to date for identification of peptide-RNA cross-links including automatic localization of the cross-links at amino acid resolution and localization scoring. It comes with a customized integrated cross-link fragment spectrum viewer for convenient manual inspection and validation of the results.
A data management and publication workflow for a large-scale, heterogeneous sensor network.

PubMed

Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan

2015-06-01

It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.
A Brokering Solution for Business Process Execution

NASA Astrophysics Data System (ADS)

Santoro, M.; Bigagli, L.; Roncella, R.; Mazzetti, P.; Nativi, S.

2012-12-01

Predicting the climate change impact on biodiversity and ecosystems, advancing our knowledge of environmental phenomena interconnection, assessing the validity of simulations and other key challenges of Earth Sciences require intensive use of environmental modeling. The complexity of Earth system requires the use of more than one model (often from different disciplines) to represent complex processes. The identification of appropriate mechanisms for reuse, chaining and composition of environmental models is considered a key enabler for an effective uptake of a global Earth Observation infrastructure, currently pursued by the international geospatial research community. The Group on Earth Observation (GEO) Model Web initiative aims to increase present accessibility and interoperability of environmental models, allowing their flexible composition into complex Business Processes (BPs). A few, basic principles are at the base of the Model Web concept (Nativi, et al.): 1. Open access 2. Minimal entry-barriers 3. Service-driven approach 4. Scalability In this work we propose an architectural solution aiming to contribute to the Model Web vision. This solution applies the Brokering approach for facilitiating complex multidisciplinary interoperability. The Brokering approach is currently adopted in the new GEOSS Common Infrastructure (GCI) as was presented at the last GEO Plenary meeting in Istanbul, November 2011. According to the Brokering principles, the designed system is flexible enough to support the use of multiple BP design (visual) tools, heterogeneous Web interfaces for model execution (e.g. OGC WPS, WSDL, etc.), and different Workflow engines. We designed and prototyped a component called BP Broker that is able to: (i) read an abstract BP, (ii) "compile" the abstract BP into an executable one (eBP) - in this phase the BP Broker might also provide recommendations for incomplete BPs and parameter mismatch resolution - and (iii) finally execute the eBP using a Workflow engine. The present implementation makes use of BPMN 2.0 notation for BP design and jBPM workflow engine for eBP execution; however, the strong decoupling which characterizes the design of the BP Broker easily allows supporting other technologies. The main benefits of the proposed approach are: (i) no need for a composition infrastructure, (ii) alleviation from technicalities of workflow definitions, (iii) support of incomplete BPs, and (iv) the reuse of existing BPs as atomic processes. The BP Broker was designed and prototyped in the EC funded projects EuroGEOSS (http://www.eurogeoss.eu) and UncertWeb (http://www.uncertweb.org); the latter project provided also the use scenarios that were used to test the framework: the eHabitat scenario (calculation habitat similarity likelihood) and the FERA scenario (impact of climate change on land-use and crop yield). Three more scenarios are presently under development. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreements n. 248488 and n. 226487. References Nativi, S., Mazzetti, P., & Geller, G. (2012), "Environmental model access and interoperability: The GEO Model Web initiative". Environmental Modelling & Software , 1-15
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dreher, M.; Peterka, T.

Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

NASA Astrophysics Data System (ADS)

Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

2017-12-01

This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

Questioned document workflow for handwriting with automated tools

NASA Astrophysics Data System (ADS)

Das, Krishnanand; Srihari, Sargur N.; Srinivasan, Harish

2012-01-01

During the last few years many document recognition methods have been developed to determine whether a handwriting specimen can be attributed to a known writer. However, in practice, the work-flow of the document examiner continues to be manual-intensive. Before a systematic or computational, approach can be developed, an articulation of the steps involved in handwriting comparison is needed. We describe the work flow of handwritten questioned document examination, as described in a standards manual, and the steps where existing automation tools can be used. A well-known ransom note case is considered as an example, where one encounters testing for multiple writers of the same document, determining whether the writing is disguised, known writing is formal while questioned writing is informal, etc. The findings for the particular ransom note case using the tools are given. Also observations are made for developing a more fully automated approach to handwriting examination.
Integrating advanced visualization technology into the planetary Geoscience workflow

NASA Astrophysics Data System (ADS)

Huffman, John; Forsberg, Andrew; Loomis, Andrew; Head, James; Dickson, James; Fassett, Caleb

2011-09-01

Recent advances in computer visualization have allowed us to develop new tools for analyzing the data gathered during planetary missions, which is important, since these data sets have grown exponentially in recent years to tens of terabytes in size. As part of the Advanced Visualization in Solar System Exploration and Research (ADVISER) project, we utilize several advanced visualization techniques created specifically with planetary image data in mind. The Geoviewer application allows real-time active stereo display of images, which in aggregate have billions of pixels. The ADVISER desktop application platform allows fast three-dimensional visualization of planetary images overlain on digital terrain models. Both applications include tools for easy data ingest and real-time analysis in a programmatic manner. Incorporation of these tools into our everyday scientific workflow has proved important for scientific analysis, discussion, and publication, and enabled effective and exciting educational activities for students from high school through graduate school.
Supporting metabolomics with adaptable software: design architectures for the end-user.

PubMed

Sarpe, Vladimir; Schriemer, David C

2017-02-01

Large and disparate sets of LC-MS data are generated by modern metabolomics profiling initiatives, and while useful software tools are available to annotate and quantify compounds, the field requires continued software development in order to sustain methodological innovation. Advances in software development practices allow for a new paradigm in tool development for metabolomics, where increasingly the end-user can develop or redeploy utilities ranging from simple algorithms to complex workflows. Resources that provide an organized framework for development are described and illustrated with LC-MS processing packages that have leveraged their design tools. Full access to these resources depends in part on coding experience, but the emergence of workflow builders and pluggable frameworks strongly reduces the skill level required. Developers in the metabolomics community are encouraged to use these resources and design content for uptake and reuse. Copyright © 2016 Elsevier Ltd. All rights reserved.
Automation and workflow considerations for embedding Digimarc Barcodes at scale

NASA Astrophysics Data System (ADS)

Rodriguez, Tony; Haaga, Don; Calhoon, Sean

2015-03-01

The Digimarc® Barcode is a digital watermark applied to packages and variable data labels that carries GS1 standard GTIN-14 data traditionally carried by a 1-D barcode. The Digimarc Barcode can be read with smartphones and imaging-based barcode readers commonly used in grocery and retail environments. Using smartphones, consumers can engage with products and retailers can materially increase the speed of check-out, increasing store margins and providing a better experience for shoppers. Internal testing has shown an average of 53% increase in scanning throughput, enabling 100's of millions of dollars in cost savings [1] for retailers when deployed at scale. To get to scale, the process of embedding a digital watermark must be automated and integrated within existing workflows. Creating the tools and processes to do so represents a new challenge for the watermarking community. This paper presents a description and an analysis of the workflow implemented by Digimarc to deploy the Digimarc Barcode at scale. An overview of the tools created and lessons learned during the introduction of technology to the market are provided.
Profiling of Histone Post-Translational Modifications in Mouse Brain with High-Resolution Top-Down Mass Spectrometry.

PubMed

Zhou, Mowei; Paša-Tolić, Ljiljana; Stenoien, David L

2017-02-03

As histones play central roles in most chromosomal functions including regulation of DNA replication, DNA damage repair, and gene transcription, both their basic biology and their roles in disease development have been the subject of intense study. Because multiple post-translational modifications (PTMs) along the entire protein sequence are potential regulators of histones, a top-down approach, where intact proteins are analyzed, is ultimately required for complete characterization of proteoforms. However, significant challenges remain for top-down histone analysis primarily because of deficiencies in separation/resolving power and effective identification algorithms. Here we used state-of-the-art mass spectrometry and a bioinformatics workflow for targeted data analysis and visualization. The workflow uses ProMex for intact mass deconvolution, MSPathFinder as a search engine, and LcMsSpectator as a data visualization tool. When complemented with the open-modification tool TopPIC, this workflow enabled identification of novel histone PTMs including tyrosine bromination on histone H4 and H2A, H3 glutathionylation, and mapping of conventional PTMs along the entire protein for many histone subunits.
Installation and Testing of ITER Integrated Modeling and Analysis Suite (IMAS) on DIII-D

NASA Astrophysics Data System (ADS)

Lao, L.; Kostuk, M.; Meneghini, O.; Smith, S.; Staebler, G.; Kalling, R.; Pinches, S.

2017-10-01

A critical objective of the ITER Integrated Modeling Program is the development of IMAS to support ITER plasma operation and research activities. An IMAS framework has been established based on the earlier work carried out within the EU. It consists of a physics data model and a workflow engine. The data model is capable of representing both simulation and experimental data and is applicable to ITER and other devices. IMAS has been successfully installed on a local DIII-D server using a flexible installer capable of managing the core data access tools (Access Layer and Data Dictionary) and optionally the Kepler workflow engine and coupling tools. A general adaptor for OMFIT (a workflow engine) is being built for adaptation of any analysis code to IMAS using a new IMAS universal access layer (UAL) interface developed from an existing OMFIT EU Integrated Tokamak Modeling UAL. Ongoing work includes development of a general adaptor for EFIT and TGLF based on this new UAL that can be readily extended for other physics codes within OMFIT. Work supported by US DOE under DE-FC02-04ER54698.
An Update on Improvements to NiCE Support for PROTEUS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bennett, Andrew; McCaskey, Alexander J.; Billings, Jay Jay

2015-09-01

The Department of Energy Office of Nuclear Energy's Nuclear Energy Advanced Modeling and Simulation (NEAMS) program has supported the development of the NEAMS Integrated Computational Environment (NiCE), a modeling and simulation workflow environment that provides services and plugins to facilitate tasks such as code execution, model input construction, visualization, and data analysis. This report details the development of workflows for the reactor core neutronics application, PROTEUS. This advanced neutronics application (primarily developed at Argonne National Laboratory) aims to improve nuclear reactor design and analysis by providing an extensible and massively parallel, finite-element solver for current and advanced reactor fuel neutronicsmore » modeling. The integration of PROTEUS-specific tools into NiCE is intended to make the advanced capabilities that PROTEUS provides more accessible to the nuclear energy research and development community. This report will detail the work done to improve existing PROTEUS workflow support in NiCE. We will demonstrate and discuss these improvements, including the development of flexible IO services, an improved interface for input generation, and the addition of advanced Fortran development tools natively in the platform.« less
Usability testing of Avoiding Diabetes Thru Action Plan Targeting (ADAPT) decision support for integrating care-based counseling of pre-diabetes in an electronic health record.

PubMed

Chrimes, Dillon; Kitos, Nicole R; Kushniruk, Andre; Mann, Devin M

2014-09-01

Usability testing can be used to evaluate human-computer interaction (HCI) and communication in shared decision making (SDM) for patient-provider behavioral change and behavioral contracting. Traditional evaluations of usability using scripted or mock patient scenarios with think-aloud protocol analysis provide a way to identify HCI issues. In this paper we describe the application of these methods in the evaluation of the Avoiding Diabetes Thru Action Plan Targeting (ADAPT) tool, and test the usability of the tool to support the ADAPT framework for integrated care counseling of pre-diabetes. The think-aloud protocol analysis typically does not provide an assessment of how patient-provider interactions are effected in "live" clinical workflow or whether a tool is successful. Therefore, "Near-live" clinical simulations involving applied simulation methods were used to compliment the think-aloud results. This complementary usability technique was used to test the end-user HCI and tool performance by more closely mimicking the clinical workflow and capturing interaction sequences along with assessing the functionality of computer module prototypes on clinician workflow. We expected this method to further complement and provide different usability findings as compared to think-aloud analysis. Together, this mixed method evaluation provided comprehensive and realistic feedback for iterative refinement of the ADAPT system prior to implementation. The study employed two phases of testing of a new interactive ADAPT tool that embedded an evidence-based shared goal setting component into primary care workflow for dealing with pre-diabetes counseling within a commercial physician office electronic health record (EHR). Phase I applied usability testing that involved "think-aloud" protocol analysis of eight primary care providers interacting with several scripted clinical scenarios. Phase II used "near-live" clinical simulations of five providers interacting with standardized trained patient actors enacting the clinical scenario of counseling for pre-diabetes, each of whom had a pedometer that recorded the number of steps taken over a week. In both phases, all sessions were audio-taped and motion screen-capture software was activated for onscreen recordings. Transcripts were coded using iterative qualitative content analysis methods. In Phase I, the impact of the components and layout of ADAPT on user's Navigation, Understandability, and Workflow were associated with the largest volume of negative comments (i.e. approximately 80% of end-user commentary), while Usability and Content of ADAPT were representative of more positive than negative user commentary. The heuristic category of Usability had a positive-to-negative comment ratio of 2.1, reflecting positive perception of the usability of the tool, its functionality, and overall co-productive utilization of ADAPT. However, there were mixed perceptions about content (i.e., how the information was displayed, organized and described in the tool). In Phase II, the duration of patient encounters was approximately 10 min with all of the Patient Instructions (prescriptions) and behavioral contracting being activated at the end of each visit. Upon activation, providers accepted the pathway prescribed by the tool 100% of the time and completed all the fields in the tool in the simulation cases. Only 14% of encounter time was spent using the functionality of the ADAPT tool in terms of keystrokes and entering relevant data. The rest of the time was spent on communication and dialog to populate the patient instructions. In all cases, the interaction sequence of reviewing and discussing exercise and diet of the patient was linked to the functionality of the ADAPT tool in terms of monitoring, response-efficacy, self-efficacy, and negotiation in the patient-provider dialog. There was a change from one-way dialog to two-way dialog and negotiation that ended in a behavioral contract. This change demonstrated the tool's sequence, which supported recording current exercise and diet followed by a diet and exercise goal setting procedure to reduce the risk of diabetes onset. This study demonstrated that "think-aloud" protocol analysis with "near-live" clinical simulations provided a successful usability evaluation of a new primary care pre-diabetes shared goal setting tool. Each phase of the study provided complementary observations on problems with the new onscreen tool and was used to show the influence of the ADAPT framework on the usability, workflow integration, and communication between the patient and provider. The think-aloud tests with the provider showed the tool can be used according to the ADAPT framework (exercise-to-diet behavior change and tool utilization), while the clinical simulations revealed the ADAPT framework to realistically support patient-provider communication to obtain behavioral change contract. SDM interactions and mechanisms affecting protocol-based care can be more completely captured by combining "near-live" clinical simulations with traditional "think-aloud analysis" which augments clinician utilization. More analysis is required to verify if the rich communication actions found in Phase II compliment clinical workflows. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Use of a collaborative tool to simplify the outsourcing of preclinical safety studies: an insight into the AstraZeneca-Charles River Laboratories strategic relationship.

PubMed

Martin, Frederic D C; Benjamin, Amanda; MacLean, Ruth; Hollinshead, David M; Landqvist, Claire

2017-12-01

In 2012, AstraZeneca entered into a strategic relationship with Charles River Laboratories whereby preclinical safety packages comprising safety pharmacology, toxicology, formulation analysis, in vivo ADME, bioanalysis and pharmacokinetics studies are outsourced. New processes were put in place to ensure seamless workflows with the aim of accelerating the delivery of new medicines to patients. Here, we describe in more detail the AstraZeneca preclinical safety outsourcing model and the way in which a collaborative tool has helped to translate the processes in AstraZeneca and Charles River Laboratories into simpler integrated workflows that are efficient and visible across the two companies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rapid resistome mapping using nanopore sequencing

PubMed Central

Imamovic, Lejla; Hashim Ellabaan, Mostafa M.; van Schaik, Willem; Koza, Anna

2017-01-01

Abstract The emergence of antibiotic resistance in human pathogens has become a major threat to modern medicine. The outcome of antibiotic treatment can be affected by the composition of the gut. Accordingly, knowledge of the gut resistome composition could enable more effective and individualized treatment of bacterial infections. Yet, rapid workflows for resistome characterization are lacking. To address this challenge we developed the poreFUME workflow that deploys functional metagenomic selections and nanopore sequencing to resistome mapping. We demonstrate the approach by functionally characterizing the gut resistome of an ICU (intensive care unit) patient. The accuracy of the poreFUME pipeline is with >97% sufficient for the annotation of antibiotic resistance genes. The poreFUME pipeline provides a promising approach for efficient resistome profiling that could inform antibiotic treatment decisions in the future. PMID:28062856
It's All About the Data: Workflow Systems and Weather

NASA Astrophysics Data System (ADS)

Plale, B.

2009-05-01

Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
Solutions for Mining Distributed Scientific Data

NASA Astrophysics Data System (ADS)

Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.

2007-12-01

Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
How Can We Better Detect Unauthorized GMOs in Food and Feed Chains?

PubMed

Fraiture, Marie-Alice; Herman, Philippe; De Loose, Marc; Debode, Frédéric; Roosens, Nancy H

2017-06-01

Current GMO detection systems have limited abilities to detect unauthorized genetically modified organisms (GMOs). Here, we propose a new workflow, based on next-generation sequencing (NGS) technology, to overcome this problem. In providing information about DNA sequences, this high-throughput workflow can distinguish authorized and unauthorized GMOs by strengthening the tools commonly used by enforcement laboratories with the help of NGS technology. In addition, thanks to its massive sequencing capacity, this workflow could be used to monitor GMOs present in the food and feed chain. In view of its potential implementation by enforcement laboratories, we discuss this innovative approach, its current limitations, and its sustainability of use over time. Copyright © 2017 Elsevier Ltd. All rights reserved.
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.

2013-12-01

The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
Report Central: Quality Reporting Tool in an Electronic Health Record

PubMed Central

Jung, Eunice; Li, Qi; Mangalampalli, Anil; Greim, Julie; Eskin, Michael S.; Housman, Dan; Isikoff, Jeremy; Abend, Aaron H.; Middleton, Blackford; Einbinder, Jonathan S.

2006-01-01

Quality reporting tools, integrated with ambulatory electronic health records, can help clinicians and administrators understand performance, manage populations, and improve quality. Report Central is a secure web report delivery tool built on Crystal Reports XI™ and ASP.NET technologies. Pilot evaluation of Report Central indicates that clinicians prefer a quality reporting tool that is integrated with our home-grown EHR to support clinical workflow. PMID:17238590
LAMBADA and InflateGRO2: efficient membrane alignment and insertion of membrane proteins for molecular dynamics simulations.

PubMed

Schmidt, Thomas H; Kandt, Christian

2012-10-22

At the beginning of each molecular dynamics membrane simulation stands the generation of a suitable starting structure which includes the working steps of aligning membrane and protein and seamlessly accommodating the protein in the membrane. Here we introduce two efficient and complementary methods based on pre-equilibrated membrane patches, automating these steps. Using a voxel-based cast of the coarse-grained protein, LAMBADA computes a hydrophilicity profile-derived scoring function based on which the optimal rotation and translation operations are determined to align protein and membrane. Employing an entirely geometrical approach, LAMBADA is independent from any precalculated data and aligns even large membrane proteins within minutes on a regular workstation. LAMBADA is the first tool performing the entire alignment process automatically while providing the user with the explicit 3D coordinates of the aligned protein and membrane. The second tool is an extension of the InflateGRO method addressing the shortcomings of its predecessor in a fully automated workflow. Determining the exact number of overlapping lipids based on the area occupied by the protein and restricting expansion, compression and energy minimization steps to a subset of relevant lipids through automatically calculated and system-optimized operation parameters, InflateGRO2 yields optimal lipid packing and reduces lipid vacuum exposure to a minimum preserving as much of the equilibrated membrane structure as possible. Applicable to atomistic and coarse grain structures in MARTINI format, InflateGRO2 offers high accuracy, fast performance, and increased application flexibility permitting the easy preparation of systems exhibiting heterogeneous lipid composition as well as embedding proteins into multiple membranes. Both tools can be used separately, in combination with other methods, or in tandem permitting a fully automated workflow while retaining a maximum level of usage control and flexibility. To assess the performance of both methods, we carried out test runs using 22 membrane proteins of different size and transmembrane structure.
Performance and Architecture Lab Modeling Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-06-19

Analytical application performance models are critical for diagnosing performance-limiting resources, optimizing systems, and designing machines. Creating models, however, is difficult. Furthermore, models are frequently expressed in forms that are hard to distribute and validate. The Performance and Architecture Lab Modeling tool, or Palm, is a modeling tool designed to make application modeling easier. Palm provides a source code modeling annotation language. Not only does the modeling language divide the modeling task into sub problems, it formally links an application's source code with its model. This link is important because a model's purpose is to capture application behavior. Furthermore, this linkmore » makes it possible to define rules for generating models according to source code organization. Palm generates hierarchical models according to well-defined rules. Given an application, a set of annotations, and a representative execution environment, Palm will generate the same model. A generated model is a an executable program whose constituent parts directly correspond to the modeled application. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. A model's hierarchy is defined by static and dynamic source code structure. Because Palm coordinates models and source code, Palm's models are 'first-class' and reproducible. Palm automates common modeling tasks. For instance, Palm incorporates measurements to focus attention, represent constant behavior, and validate models. Palm's workflow is as follows. The workflow's input is source code annotated with Palm modeling annotations. The most important annotation models an instance of a block of code. Given annotated source code, the Palm Compiler produces executables and the Palm Monitor collects a representative performance profile. The Palm Generator synthesizes a model based on the static and dynamic mapping of annotations to program behavior. The model -- an executable program -- is a hierarchical composition of annotation functions, synthesized functions, statistics for runtime values, and performance measurements.« less
Implementation of Systematic Review Tools in IRIS | Science ...

EPA Pesticide Factsheets

Currently, the number of chemicals present in the environment exceeds the ability of public health scientists to efficiently screen the available data in order to produce well-informed human health risk assessments in a timely manner. For this reason, the US EPA’s Integrated Risk Information System (IRIS) program has started implementing new software tools into the hazard characterization workflow. These automated tools aid in multiple phases of the systematic review process including scoping and problem formulation, literature search, and identification and screening of available published studies. The increased availability of these tools lays the foundation for automating or semi-automating multiple phases of the systematic review process. Some of these software tools include modules to facilitate a structured approach to study quality evaluation of human and animal data, although approaches are generally lacking for assessing complex mechanistic information, in particular “omics”-based evidence tools are starting to become available to evaluate these types of studies. We will highlight how new software programs, online tools, and approaches for assessing study quality can be better integrated to allow for a more efficient and transparent workflow of the risk assessment process as well as identify tool gaps that would benefit future risk assessments. Disclaimer: The views expressed here are those of the authors and do not necessarily represent the view
An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

PubMed

Winkler, Robert

2015-01-01

In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.
An end-to-end workflow for engineering of biological networks from high-level specifications.

PubMed

Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

2012-08-17

We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

Ab initio chemical safety assessment: A workflow based on exposure considerations and non-animal methods.

PubMed

Berggren, Elisabet; White, Andrew; Ouedraogo, Gladys; Paini, Alicia; Richarz, Andrea-Nicole; Bois, Frederic Y; Exner, Thomas; Leite, Sofia; Grunsven, Leo A van; Worth, Andrew; Mahony, Catherine

2017-11-01

We describe and illustrate a workflow for chemical safety assessment that completely avoids animal testing. The workflow, which was developed within the SEURAT-1 initiative, is designed to be applicable to cosmetic ingredients as well as to other types of chemicals, e.g. active ingredients in plant protection products, biocides or pharmaceuticals. The aim of this work was to develop a workflow to assess chemical safety without relying on any animal testing, but instead constructing a hypothesis based on existing data, in silico modelling, biokinetic considerations and then by targeted non-animal testing. For illustrative purposes, we consider a hypothetical new ingredient x as a new component in a body lotion formulation. The workflow is divided into tiers in which points of departure are established through in vitro testing and in silico prediction, as the basis for estimating a safe external dose in a repeated use scenario. The workflow includes a series of possible exit (decision) points, with increasing levels of confidence, based on the sequential application of the Threshold of Toxicological (TTC) approach, read-across, followed by an "ab initio" assessment, in which chemical safety is determined entirely by new in vitro testing and in vitro to in vivo extrapolation by means of mathematical modelling. We believe that this workflow could be applied as a tool to inform targeted and toxicologically relevant in vitro testing, where necessary, and to gain confidence in safety decision making without the need for animal testing.
Usability testing of Avoiding Diabetes Thru Action Plan Targeting (ADAPT) decision support for integrating care-based counseling of pre-diabetes in an electronic health record

PubMed Central

Chrimes, Dillon; Kushniruk, Andre; Kitos, Nicole R.

2014-01-01

Purpose Usability testing can be used to evaluate human computer interaction (HCI) and communication in shared decision making (SDM) for patient-provider behavioral change and behavioral contracting. Traditional evaluations of usability using scripted or mock patient scenarios with think-aloud protocol analysis provide a to identify HCI issues. In this paper we describe the application of these methods in the evaluation of the Avoiding Diabetes Thru Action Plan Targeting (ADAPT) tool, and test the usability of the tool to support the ADAPT framework for integrated care counseling of pre-diabetes. The think-aloud protocol analysis typically does not provide an assessment of how patient-provider interactions are effected in “live” clinical workflow or whether a tool is successful. Therefore, “Near-live” clinical simulations involving applied simulation methods were used to compliment the think-aloud results. This complementary usability technique was used to test the end-user HCI and tool performance by more closely mimicking the clinical workflow and capturing interaction sequences along with assessing the functionality of computer module prototypes on clinician workflow. We expected this method to further complement and provide different usability findings as compared to think-aloud analysis. Together, this mixed method evaluation provided comprehensive and realistic feedback for iterative refinement of the ADAPT system prior to implementation. Methods The study employed two phases of testing of a new interactive ADAPT tool that embedded an evidence-based shared goal setting component into primary care workflow for dealing with pre-diabetes counseling within a commercial physician office electronic health record (EHR). Phase I applied usability testing that involved “think-aloud” protocol analysis of 8 primary care providers interacting with several scripted clinical scenarios. Phase II used “near-live” clinical simulations of 5 providers interacting with standardized trained patient actors enacting the clinical scenario of counseling for pre-diabetes, each of whom had a pedometer that recorded the number of steps taken over a week. In both phases, all sessions were audio-taped and motion screen-capture software was activated for onscreen recordings. Transcripts were coded using iterative qualitative content analysis methods. Results In Phase I, the impact of the components and layout of ADAPT on user’s Navigation, Understandability, and Workflow were associated with the largest volume of negative comments (i.e. approximately 80% of end-user commentary), while Usability and Content of ADAPT were representative of more positive than negative user commentary. The heuristic category of Usability had a positive-to-negative comment ratio of 2.1, reflecting positive perception of the usability of the tool, its functionality, and overall co-productive utilization of ADAPT. However, there were mixed perceptions about content (i.e., how the information was displayed, organized and described in the tool). In Phase II, the duration of patient encounters was approximately 10 minutes with all of the Patient Instructions (prescriptions) and behavioral contracting being activated at the end of each visit. Upon activation, providers accepted the pathway prescribed by the tool 100% of the time and completed all the fields in the tool in the simulation cases. Only 14% of encounter time was spent using the functionality of the ADAPT tool in terms of keystrokes and entering relevant data. The rest of the time was spent on communication and dialogue to populate the patient instructions. In all cases, the interaction sequence of reviewing and discussing exercise and diet of the patient was linked to the functionality of the ADAPT tool in terms of monitoring, response-efficacy, self-efficacy, and negotiation in the patient-provider dialogue. There was a change from one-way dialogue to two-way dialogue and negotiation that ended in a behavioral contract. This change demonstrated the tool’s sequence, which supported recording current exercise and diet followed by a diet and exercise goal setting procedure to reduce the risk of diabetes onset. Conclusions This study demonstrated that “think-aloud” protocol analysis with “near-live” clinical simulations provided a successful usability evaluation of a new primary care pre-diabetes shared goal setting tool. Each phase of the study provided complementary observations on problems with the new onscreen tool and was used to show the influence of the ADAPT framework on the usability, workflow integration, and communication between the patient and provider. The think-aloud tests with the provider showed the tool can be used according to the ADAPT framework (exercise-to-diet behavior change and tool utilization), while the clinical simulations revealed the ADAPT framework to realistically support patient-provider communication to obtain behavioral change contract. SDM interactions and mechanisms affecting protocol-based care can be more completely captured by combining “near-live” clinical simulations with traditional “think-aloud analysis” which augments clinician utilization. More analysis is required to verify if the rich communication actions found in Phase II compliment clinical workflows. PMID:24981988
Comprehensive and quantitative profiling of lipid species in human milk, cow milk and a phospholipid-enriched milk formula by GC and MS/MSALL.

PubMed

Sokol, Elena; Ulven, Trond; Færgeman, Nils J; Ejsing, Christer S

2015-06-01

Here we present a workflow for in-depth analysis of milk lipids that combines gas chromatography (GC) for fatty acid (FA) profiling and a shotgun lipidomics routine termed MS/MS ALL for structural characterization of molecular lipid species. To evaluate the performance of the workflow we performed a comparative lipid analysis of human milk, cow milk, and Lacprodan® PL-20, a phospholipid-enriched milk protein concentrate for infant formula. The GC analysis showed that human milk and Lacprodan have a similar FA profile with higher levels of unsaturated FAs as compared to cow milk. In-depth lipidomic analysis by MS/MS ALL revealed that each type of milk sample comprised distinct composition of molecular lipid species. Lipid class composition showed that the human and cow milk contain a higher proportion of triacylglycerols (TAGs) as compared to Lacprodan. Notably, the MS/MS ALL analysis demonstrated that the similar FA profile of human milk and Lacprodan determined by GC analysis is attributed to the composition of individual TAG species in human milk and glycerophospholipid species in Lacprodan. Moreover, the analysis of TAG molecules in Lacprodan and cow milk showed a high proportion of short-chain FAs that could not be monitored by GC analysis. The results presented here show that complementary GC and MS/MS ALL analysis is a powerful approach for characterization of molecular lipid species in milk and milk products. : Milk lipid analysis is routinely performed using gas chromatography. This method reports the total fatty acid composition of all milk lipids, but provides no structural or quantitative information about individual lipid molecules in milk or milk products. Here we present a workflow that integrates gas chromatography for fatty acid profiling and a shotgun lipidomics routine termed MS/MS ALL for structural analysis and quantification of molecular lipid species. We demonstrate the efficacy of this complementary workflow by a comparative analysis of molecular lipid species in human milk, cow milk, and a milk-based supplement used for infant formula.
Comprehensive and quantitative profiling of lipid species in human milk, cow milk and a phospholipid-enriched milk formula by GC and MS/MSALL

PubMed Central

Sokol, Elena; Ulven, Trond; Færgeman, Nils J; Ejsing, Christer S

2015-01-01

Here we present a workflow for in-depth analysis of milk lipids that combines gas chromatography (GC) for fatty acid (FA) profiling and a shotgun lipidomics routine termed MS/MSALL for structural characterization of molecular lipid species. To evaluate the performance of the workflow we performed a comparative lipid analysis of human milk, cow milk, and Lacprodan® PL-20, a phospholipid-enriched milk protein concentrate for infant formula. The GC analysis showed that human milk and Lacprodan have a similar FA profile with higher levels of unsaturated FAs as compared to cow milk. In-depth lipidomic analysis by MS/MSALL revealed that each type of milk sample comprised distinct composition of molecular lipid species. Lipid class composition showed that the human and cow milk contain a higher proportion of triacylglycerols (TAGs) as compared to Lacprodan. Notably, the MS/MSALL analysis demonstrated that the similar FA profile of human milk and Lacprodan determined by GC analysis is attributed to the composition of individual TAG species in human milk and glycerophospholipid species in Lacprodan. Moreover, the analysis of TAG molecules in Lacprodan and cow milk showed a high proportion of short-chain FAs that could not be monitored by GC analysis. The results presented here show that complementary GC and MS/MSALL analysis is a powerful approach for characterization of molecular lipid species in milk and milk products. Practical applications : Milk lipid analysis is routinely performed using gas chromatography. This method reports the total fatty acid composition of all milk lipids, but provides no structural or quantitative information about individual lipid molecules in milk or milk products. Here we present a workflow that integrates gas chromatography for fatty acid profiling and a shotgun lipidomics routine termed MS/MSALL for structural analysis and quantification of molecular lipid species. We demonstrate the efficacy of this complementary workflow by a comparative analysis of molecular lipid species in human milk, cow milk, and a milk-based supplement used for infant formula. PMID:26089741
FAST: FAST Analysis of Sequences Toolbox

PubMed Central

Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

2015-01-01

FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145
Understanding and Visualizing Multitasking and Task Switching Activities: A Time Motion Study to Capture Nursing Workflow

PubMed Central

Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L.; Migliore, Elaina M.; Chipps, Esther M.; Buck, Jacalyn

2016-01-01

A fundamental understanding of multitasking within nursing workflow is important in today’s dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives. PMID:28269924
Make Your Workflows Smarter

NASA Technical Reports Server (NTRS)

Jones, Corey; Kapatos, Dennis; Skradski, Cory

2012-01-01

Do you have workflows with many manual tasks that slow down your business? Or, do you scale back workflows because there are simply too many manual tasks? Basic workflow robots can automate some common tasks, but not everything. This presentation will show how advanced robots called "expression robots" can be set up to perform everything from simple tasks such as: moving, creating folders, renaming, changing or creating an attribute, and revising, to more complex tasks like: creating a pdf, or even launching a session of Creo Parametric and performing a specific modeling task. Expression robots are able to utilize the Java API and Info*Engine to do almost anything you can imagine! Best of all, these tools are supported by PTC and will work with later releases of Windchill. Limited knowledge of Java, Info*Engine, and XML are required. The attendee will learn what task expression robots are capable of performing. The attendee will learn what is involved in setting up an expression robot. The attendee will gain a basic understanding of simple Info*Engine tasks
Understanding and Visualizing Multitasking and Task Switching Activities: A Time Motion Study to Capture Nursing Workflow.

PubMed

Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L; Migliore, Elaina M; Chipps, Esther M; Buck, Jacalyn

2016-01-01

A fundamental understanding of multitasking within nursing workflow is important in today's dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives.
GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation models

NASA Astrophysics Data System (ADS)

Fischer, T.; Naumov, D.; Sattler, S.; Kolditz, O.; Walther, M.

2015-11-01

We offer a versatile workflow to convert geological models built with the ParadigmTM GOCAD© (Geological Object Computer Aided Design) software into the open-source VTU (Visualization Toolkit unstructured grid) format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed as a way of communication between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of environmental studies. With two application examples in the Thuringian Syncline, we show how a heterogeneous geological GOCAD model including multiple layers and faults can be used for numerical groundwater flow modeling, in our case employing the OpenGeoSys open-source numerical toolbox for groundwater flow simulations. The presented workflow offers the chance to incorporate increasingly detailed data, utilizing the growing availability of computational power to simulate numerical models.
The MPO system for automatic workflow documentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abla, G.; Coviello, E. N.; Flanagan, S. M.

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences.

PubMed

Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf

2016-06-01

This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.
The MPO system for automatic workflow documentation

DOE PAGES

Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

2016-04-18

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
Automation of lidar-based hydrologic feature extraction workflows using GIS

NASA Astrophysics Data System (ADS)

Borlongan, Noel Jerome B.; de la Cruz, Roel M.; Olfindo, Nestor T.; Perez, Anjillyn Mae C.

2016-10-01

With the advent of LiDAR technology, higher resolution datasets become available for use in different remote sensing and GIS applications. One significant application of LiDAR datasets in the Philippines is in resource features extraction. Feature extraction using LiDAR datasets require complex and repetitive workflows which can take a lot of time for researchers through manual execution and supervision. The Development of the Philippine Hydrologic Dataset for Watersheds from LiDAR Surveys (PHD), a project under the Nationwide Detailed Resources Assessment Using LiDAR (Phil-LiDAR 2) program, created a set of scripts, the PHD Toolkit, to automate its processes and workflows necessary for hydrologic features extraction specifically Streams and Drainages, Irrigation Network, and Inland Wetlands, using LiDAR Datasets. These scripts are created in Python and can be added in the ArcGIS® environment as a toolbox. The toolkit is currently being used as an aid for the researchers in hydrologic feature extraction by simplifying the workflows, eliminating human errors when providing the inputs, and providing quick and easy-to-use tools for repetitive tasks. This paper discusses the actual implementation of different workflows developed by Phil-LiDAR 2 Project 4 in Streams, Irrigation Network and Inland Wetlands extraction.
A verification strategy for web services composition using enhanced stacked automata model.

PubMed

Nagamouttou, Danapaquiame; Egambaram, Ilavarasan; Krishnan, Muthumanickam; Narasingam, Poonkuzhali

2015-01-01

Currently, Service-Oriented Architecture (SOA) is becoming the most popular software architecture of contemporary enterprise applications, and one crucial technique of its implementation is web services. Individual service offered by some service providers may symbolize limited business functionality; however, by composing individual services from different service providers, a composite service describing the intact business process of an enterprise can be made. Many new standards have been defined to decipher web service composition problem namely Business Process Execution Language (BPEL). BPEL provides an initial work for forming an Extended Markup Language (XML) specification language for defining and implementing business practice workflows for web services. The problems with most realistic approaches to service composition are the verification of composed web services. It has to depend on formal verification method to ensure the correctness of composed services. A few research works has been carried out in the literature survey for verification of web services for deterministic system. Moreover the existing models did not address the verification properties like dead transition, deadlock, reachability and safetyness. In this paper, a new model to verify the composed web services using Enhanced Stacked Automata Model (ESAM) has been proposed. The correctness properties of the non-deterministic system have been evaluated based on the properties like dead transition, deadlock, safetyness, liveness and reachability. Initially web services are composed using Business Process Execution Language for Web Service (BPEL4WS) and it is converted into ESAM (combination of Muller Automata (MA) and Push Down Automata (PDA)) and it is transformed into Promela language, an input language for Simple ProMeLa Interpreter (SPIN) tool. The model is verified using SPIN tool and the results revealed better recital in terms of finding dead transition and deadlock in contrast to the existing models.
Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline

PubMed Central

Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur

2010-01-01

Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408
MONA – Interactive manipulation of molecule collections

PubMed Central

2013-01-01

Working with small‐molecule datasets is a routine task for cheminformaticians and chemists. The analysis and comparison of vendor catalogues and the compilation of promising candidates as starting points for screening campaigns are but a few very common applications. The workflows applied for this purpose usually consist of multiple basic cheminformatics tasks such as checking for duplicates or filtering by physico‐chemical properties. Pipelining tools allow to create and change such workflows without much effort, but usually do not support interventions once the pipeline has been started. In many contexts, however, the best suited workflow is not known in advance, thus making it necessary to take the results of the previous steps into consideration before proceeding. To support intuition‐driven processing of compound collections, we developed MONA, an interactive tool that has been designed to prepare and visualize large small‐molecule datasets. Using an SQL database common cheminformatics tasks such as analysis and filtering can be performed interactively with various methods for visual support. Great care was taken in creating a simple, intuitive user interface which can be instantly used without any setup steps. MONA combines the interactivity of molecule database systems with the simplicity of pipelining tools, thus enabling the case‐to‐case application of chemistry expert knowledge. The current version is available free of charge for academic use and can be downloaded at http://www.zbh.uni‐hamburg.de/mona. PMID:23985157
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

NASA Astrophysics Data System (ADS)

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2017-10-01

We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure

PubMed Central

2016-01-01

Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.

PubMed

Mickelson, Robin S; Unertl, Kim M; Holden, Richard J

2016-10-12

Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.
Primary care physicians' perspectives on computer-based health risk assessment tools for chronic diseases: a mixed methods study.

PubMed

Voruganti, Teja R; O'Brien, Mary Ann; Straus, Sharon E; McLaughlin, John R; Grunfeld, Eva

2015-09-24

Health risk assessment tools compute an individual's risk of developing a disease. Routine use of such tools by primary care physicians (PCPs) is potentially useful in chronic disease prevention. We sought physicians' awareness and perceptions of the usefulness, usability and feasibility of performing assessments with computer-based risk assessment tools in primary care settings. Focus groups and usability testing with a computer-based risk assessment tool were conducted with PCPs from both university-affiliated and community-based practices. Analysis was derived from grounded theory methodology. PCPs (n = 30) were aware of several risk assessment tools although only select tools were used routinely. The decision to use a tool depended on how use impacted practice workflow and whether the tool had credibility. Participants felt that embedding tools in the electronic medical records (EMRs) system might allow for health information from the medical record to auto-populate into the tool. User comprehension of risk could also be improved with computer-based interfaces that present risk in different formats. In this study, PCPs chose to use certain tools more regularly because of usability and credibility. Despite there being differences in the particular tools a clinical practice used, there was general appreciation for the usefulness of tools for different clinical situations. Participants characterised particular features of an ideal tool, feeling strongly that embedding risk assessment tools in the EMR would maximise accessibility and use of the tool for chronic disease management. However, appropriate practice workflow integration and features that facilitate patient understanding at point-of-care are also essential.

xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

PubMed Central

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
Wireless remote control of clinical image workflow: using a PDA for off-site distribution and disaster recovery.

PubMed

Documet, Jorge; Liu, Brent J; Documet, Luis; Huang, H K

2006-07-01

This paper describes a picture archiving and communication system (PACS) tool based on Web technology that remotely manages medical images between a PACS archive and remote destinations. Successfully implemented in a clinical environment and also demonstrated for the past 3 years at the conferences of various organizations, including the Radiological Society of North America, this tool provides a very practical and simple way to manage a PACS, including off-site image distribution and disaster recovery. The application is robust and flexible and can be used on a standard PC workstation or a Tablet PC, but more important, it can be used with a personal digital assistant (PDA). With a PDA, the Web application becomes a powerful wireless and mobile image management tool. The application's quick and easy-to-use features allow users to perform Digital Imaging and Communications in Medicine (DICOM) queries and retrievals with a single interface, without having to worry about the underlying configuration of DICOM nodes. In addition, this frees up dedicated PACS workstations to perform their specialized roles within the PACS workflow. This tool has been used at Saint John's Health Center in Santa Monica, California, for 2 years. The average number of queries per month is 2,021, with 816 C-MOVE retrieve requests. Clinical staff members can use PDAs to manage image workflow and PACS examination distribution conveniently for off-site consultations by referring physicians and radiologists and for disaster recovery. This solution also improves radiologists' effectiveness and efficiency in health care delivery both within radiology departments and for off-site clinical coverage.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

PubMed

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
REDLetr: Workflow and tools to support the migration of legacy clinical data capture systems to REDCap.

PubMed

Dunn, William D; Cobb, Jake; Levey, Allan I; Gutman, David A

2016-09-01

A memory clinic at an academic medical center has relied on several ad hoc data capture systems including Microsoft Access and Excel for cognitive assessments over the last several years. However these solutions are challenging to maintain and limit the potential of hypothesis-driven or longitudinal research. REDCap, a secure web application based on PHP and MySQL, is a practical solution for improving data capture and organization. Here, we present a workflow and toolset to facilitate legacy data migration and real-time clinical research data collection into REDCap as well as challenges encountered. Legacy data consisted of neuropsychological tests stored in over 4000 Excel workbooks. Functions for data extraction, norm scoring, converting to REDCap-compatible formats, accessing the REDCap API, and clinical report generation were developed and executed in Python. Over 400 unique data points for each workbook were migrated and integrated into our REDCap database. Moving forward, our REDCap-based system replaces the Excel-based data collection method as well as eases the integration into the standard clinical research workflow and Electronic Health Record. In the age of growing data, efficient organization and storage of clinical and research data is critical for advancing research and providing efficient patient care. We believe that the workflow and tools described in this work to promote legacy data integration as well as real time data collection into REDCap ultimately facilitate these goals. Published by Elsevier Ireland Ltd.
Integration of SimSET photon history generator in GATE for efficient Monte Carlo simulations of pinhole SPECT.

PubMed

Chen, Chia-Lin; Wang, Yuchuan; Lee, Jason J S; Tsui, Benjamin M W

2008-07-01

The authors developed and validated an efficient Monte Carlo simulation (MCS) workflow to facilitate small animal pinhole SPECT imaging research. This workflow seamlessly integrates two existing MCS tools: simulation system for emission tomography (SimSET) and GEANT4 application for emission tomography (GATE). Specifically, we retained the strength of GATE in describing complex collimator/detector configurations to meet the anticipated needs for studying advanced pinhole collimation (e.g., multipinhole) geometry, while inserting the fast SimSET photon history generator (PHG) to circumvent the relatively slow GEANT4 MCS code used by GATE in simulating photon interactions inside voxelized phantoms. For validation, data generated from this new SimSET-GATE workflow were compared with those from GATE-only simulations as well as experimental measurements obtained using a commercial small animal pinhole SPECT system. Our results showed excellent agreement (e.g., in system point response functions and energy spectra) between SimSET-GATE and GATE-only simulations, and, more importantly, a significant computational speedup (up to approximately 10-fold) provided by the new workflow. Satisfactory agreement between MCS results and experimental data were also observed. In conclusion, the authors have successfully integrated SimSET photon history generator in GATE for fast and realistic pinhole SPECT simulations, which can facilitate research in, for example, the development and application of quantitative pinhole and multipinhole SPECT for small animal imaging. This integrated simulation tool can also be adapted for studying other preclinical and clinical SPECT techniques.
Grid-based platform for training in Earth Observation

NASA Astrophysics Data System (ADS)

Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Frincu, Marc; Neagul, Marian; Gorgan, Dorian; Stefanut, Teodor

2010-05-01

GiSHEO platform [1] providing on-demand services for training and high education in Earth Observation is developed, in the frame of an ESA funded project through its PECS programme, to respond to the needs of powerful education resources in remote sensing field. It intends to be a Grid-based platform of which potential for experimentation and extensibility are the key benefits compared with a desktop software solution. Near-real time applications requiring simultaneous multiple short-time-response data-intensive tasks, as in the case of a short time training event, are the ones that are proved to be ideal for this platform. The platform is based on Globus Toolkit 4 facilities for security and process management, and on the clusters of four academic institutions involved in the project. The authorization uses a VOMS service. The main public services are the followings: the EO processing services (represented through special WSRF-type services); the workflow service exposing a particular workflow engine; the data indexing and discovery service for accessing the data management mechanisms; the processing services, a collection allowing easy access to the processing platform. The WSRF-type services for basic satellite image processing are reusing free image processing tools, OpenCV and GDAL. New algorithms and workflows were develop to tackle with challenging problems like detecting the underground remains of old fortifications, walls or houses. More details can be found in [2]. Composed services can be specified through workflows and are easy to be deployed. The workflow engine, OSyRIS (Orchestration System using a Rule based Inference Solution), is based on DROOLS, and a new rule-based workflow language, SILK (SImple Language for worKflow), has been built. Workflow creation in SILK can be done with or without a visual designing tools. The basics of SILK are the tasks and relations (rules) between them. It is similar with the SCUFL language, but not relying on XML in order to allow the introduction of more workflow specific issues. Moreover, an event-condition-action (ECA) approach allows a greater flexibility when expressing data and task dependencies, as well as the creation of adaptive workflows which can react to changes in the configuration of the Grid or in the workflow itself. Changes inside the grid are handled by creating specific rules which allow resource selection based on various task scheduling criteria. Modifications of the workflow are usually accomplished either by inserting or retracting at runtime rules belonging to it or by modifying the executor of the task in case a better one is found. The former implies changes in its structure while the latter does not necessarily mean changes of the resource but more precisely changes of the algorithm used for solving the task. More details can be found in [3]. Another important platform component is the data indexing and storage service, GDIS, providing features for data storage, indexing data using a specialized RDBMS, finding data by various conditions, querying external services and keeping track of temporary data generated by other components. The data storage component part of GDIS is responsible for storing the data by using available storage backends such as local disk file systems (ext3), local cluster storage (GFS) or distributed file systems (HDFS). A front-end GridFTP service is capable of interacting with the storage domains on behalf of the clients and in a uniform way and also enforces the security restrictions provided by other specialized services and related with data access. The data indexing is performed by PostGIS. An advanced and flexible interface for searching the project's geographical repository is built around a custom query language (LLQL - Lisp Like Query Language) designed to provide fine grained access to the data in the repository and to query external services (e.g. for exploiting the connection with GENESI-DR catalog). More details can be found in [4]. The Workload Management System (WMS) provides two types of resource managers. The first one will be based on Condor HTC and use Condor as a job manager for task dispatching and working nodes (for development purposes) while the second one will use GT4 GRAM (for production purposes). The WMS main component, the Grid Task Dispatcher (GTD), is responsible for the interaction with other internal services as the composition engine in order to facilitate access to the processing platform. Its main responsibilities are to receive tasks from the workflow engine or directly from user interface, to use a task description language (the ClassAd meta language in case of Condor HTC) for job units, to submit and check the status of jobs inside the workload management system and to retrieve job logs for debugging purposes. More details can be found in [4]. A particular component of the platform is eGLE, the eLearning environment. It provides the functionalities necessary to create the visual appearance of the lessons through the usage of visual containers like tools, patterns and templates. The teacher uses the platform for testing the already created lessons, as well as for developing new lesson resources, such as new images and workflows describing graph-based processing. The students execute the lessons or describe and experiment with new workflows or different data. The eGLE database includes several workflow-based lesson descriptions, teaching materials and lesson resources, selected satellite and spatial data. More details can be found in [5]. A first training event of using the platform was organized in September 2009 during 11th SYNASC symposium (links to the demos, testing interface, and exercises are available on project site [1]). The eGLE component was presented at 4th GPC conference in May 2009. Moreover, the functionality of the platform will be presented as demo in April 2010 at 5th EGEE User forum. References: [1] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [2] D. Petcu, D. Zaharie, M. Neagul, S. Panica, M. Frincu, D. Gorgan, T. Stefanut, V. Bacu, Remote Sensed Image Processing on Grids for Training in Earth Observation. In Image Processing, V. Kordic (ed.), In-Tech, January 2010. [3] M. Neagul, S. Panica, D. Petcu, D. Zaharie, D. Gorgan, Web and Grid Services for Training in Earth Observation, IDAACS 2009, IEEE Computer Press, 241-246 [4] M. Frincu, S. Panica, M. Neagul, D. Petcu, Gisheo: On Demand Grid Service Based Platform for EO Data Processing. HiperGrid 2009, Politehnica Press, 415-422. [5] D. Gorgan, T. Stefanut, V. Bacu, Grid Based Training Environment for Earth Observation, GPC 2009, LNCS 5529, 98-109
A framework for streamlining research workflow in neuroscience and psychology

PubMed Central

Kubilius, Jonas

2014-01-01

Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691
EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

PubMed Central

Ison, Jon; Kalaš, Matúš; Jonassen, Inge; Bolser, Dan; Uludag, Mahmut; McWilliam, Hamish; Malone, James; Lopez, Rodrigo; Pettifer, Steve; Rice, Peter

2013-01-01

Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. Contact: jison@ebi.ac.uk PMID:23479348
A Portable Regional Weather and Climate Downscaling System Using GEOS-5, LIS-6, WRF, and the NASA Workflow Tool

NASA Astrophysics Data System (ADS)

Kemp, E. M.; Putman, W. M.; Gurganus, J.; Burns, R. W.; Damon, M. R.; McConaughy, G. R.; Seablom, M. S.; Wojcik, G. S.

2009-12-01

We present a regional downscaling system (RDS) suitable for high-resolution weather and climate simulations in multiple supercomputing environments. The RDS is built on the NASA Workflow Tool, a software framework for configuring, running, and managing computer models on multiple platforms with a graphical user interface. The Workflow Tool is used to run the NASA Goddard Earth Observing System Model Version 5 (GEOS-5), a global atmospheric-ocean model for weather and climate simulations down to 1/4 degree resolution; the NASA Land Information System Version 6 (LIS-6), a land surface modeling system that can simulate soil temperature and moisture profiles; and the Weather Research and Forecasting (WRF) community model, a limited-area atmospheric model for weather and climate simulations down to 1-km resolution. The Workflow Tool allows users to customize model settings to user needs; saves and organizes simulation experiments; distributes model runs across different computer clusters (e.g., the DISCOVER cluster at Goddard Space Flight Center, the Cray CX-1 Desktop Supercomputer, etc.); and handles all file transfers and network communications (e.g., scp connections). Together, the RDS is intended to aid researchers by making simulations as easy as possible to generate on the computer resources available. Initial conditions for LIS-6 and GEOS-5 are provided by Modern Era Retrospective-Analysis for Research and Applications (MERRA) reanalysis data stored on DISCOVER. The LIS-6 is first run for 2-4 years forced by MERRA atmospheric analyses, generating initial conditions for the WRF soil physics. GEOS-5 is then initialized from MERRA data and run for the period of interest. Large-scale atmospheric data, sea-surface temperatures, and sea ice coverage from GEOS-5 are used as boundary conditions for WRF, which is run for the same period of interest. Multiply nested grids are used for both LIS-6 and WRF, with the innermost grid run at a resolution sufficient for typical local weather features (terrain, convection, etc.) All model runs, restarts, and file transfers are coordinated by the Workflow Tool. Two use cases are being pursued. First, the RDS generates regional climate simulations down to 4-km for the Chesapeake Bay region, with WRF output provided as input to more specialized models (e.g., ocean/lake, hydrological, marine biology, and air pollution). This will allow assessment of climate impact on local interests (e.g., changes in Bay water levels and temperatures, innundation, fish kills, etc.) Second, the RDS generates high-resolution hurricane simulations in the tropical North Atlantic. This use case will support Observing System Simulation Experiments (OSSEs) of dynamically-targeted lidar observations as part of the NASA Sensor Web Simulator project. Sample results will be presented at the AGU Fall Meeting.
A virtual data language and system for scientific workflow management in data grid environments

NASA Astrophysics Data System (ADS)

Zhao, Yong

With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.
Modeling Complex Workflow in Molecular Diagnostics

PubMed Central

Gomah, Mohamed E.; Turley, James P.; Lu, Huimin; Jones, Dan

2010-01-01

One of the hurdles to achieving personalized medicine has been implementing the laboratory processes for performing and reporting complex molecular tests. The rapidly changing test rosters and complex analysis platforms in molecular diagnostics have meant that many clinical laboratories still use labor-intensive manual processing and testing without the level of automation seen in high-volume chemistry and hematology testing. We provide here a discussion of design requirements and the results of implementation of a suite of lab management tools that incorporate the many elements required for use of molecular diagnostics in personalized medicine, particularly in cancer. These applications provide the functionality required for sample accessioning and tracking, material generation, and testing that are particular to the evolving needs of individualized molecular diagnostics. On implementation, the applications described here resulted in improvements in the turn-around time for reporting of more complex molecular test sets, and significant changes in the workflow. Therefore, careful mapping of workflow can permit design of software applications that simplify even the complex demands of specialized molecular testing. By incorporating design features for order review, software tools can permit a more personalized approach to sample handling and test selection without compromising efficiency. PMID:20007844
Workflow-Based Software Development Environment

NASA Technical Reports Server (NTRS)

Izygon, Michel E.

2013-01-01

The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment
The CARMEN software as a service infrastructure.

PubMed

Weeks, Michael; Jessop, Mark; Fletcher, Martyn; Hodge, Victoria; Jackson, Tom; Austin, Jim

2013-01-28

The CARMEN platform allows neuroscientists to share data, metadata, services and workflows, and to execute these services and workflows remotely via a Web portal. This paper describes how we implemented a service-based infrastructure into the CARMEN Virtual Laboratory. A Software as a Service framework was developed to allow generic new and legacy code to be deployed as services on a heterogeneous execution framework. Users can submit analysis code typically written in Matlab, Python, C/C++ and R as non-interactive standalone command-line applications and wrap them as services in a form suitable for deployment on the platform. The CARMEN Service Builder tool enables neuroscientists to quickly wrap their analysis software for deployment to the CARMEN platform, as a service without knowledge of the service framework or the CARMEN system. A metadata schema describes each service in terms of both system and user requirements. The search functionality allows services to be quickly discovered from the many services available. Within the platform, services may be combined into more complicated analyses using the workflow tool. CARMEN and the service infrastructure are targeted towards the neuroscience community; however, it is a generic platform, and can be targeted towards any discipline.
Chang'E-3 data pre-processing system based on scientific workflow

NASA Astrophysics Data System (ADS)

tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

2016-04-01

The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.
jORCA: easily integrating bioinformatics Web Services.

PubMed

Martín-Requena, Victoria; Ríos, Javier; García, Maximiliano; Ramírez, Sergio; Trelles, Oswaldo

2010-02-15

Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various Web Services differ in their definition and invocation protocols, as well as their communication and data formats-and this presents a barrier to service interoperability. jORCA is a desktop client aimed at facilitating seamless integration of Web Services. It does so by making a uniform representation of the different web resources, supporting scalable service discovery, and automatic composition of workflows. Usability is at the top of the jORCA agenda; thus it is a highly customizable and extensible application that accommodates a broad range of user skills featuring double-click invocation of services in conjunction with advanced execution-control, on the fly data standardization, extensibility of viewer plug-ins, drag-and-drop editing capabilities, plus a file-based browsing style and organization of favourite tools. The integration of bioinformatics Web Services is made easier to support a wider range of users. .
3D correlative light and electron microscopy of cultured cells using serial blockface scanning electron microscopy

PubMed Central

Lerner, Thomas R.; Burden, Jemima J.; Nkwe, David O.; Pelchen-Matthews, Annegret; Domart, Marie-Charlotte; Durgan, Joanne; Weston, Anne; Jones, Martin L.; Peddie, Christopher J.; Carzaniga, Raffaella; Florey, Oliver; Marsh, Mark; Gutierrez, Maximiliano G.

2017-01-01

ABSTRACT The processes of life take place in multiple dimensions, but imaging these processes in even three dimensions is challenging. Here, we describe a workflow for 3D correlative light and electron microscopy (CLEM) of cell monolayers using fluorescence microscopy to identify and follow biological events, combined with serial blockface scanning electron microscopy to analyse the underlying ultrastructure. The workflow encompasses all steps from cell culture to sample processing, imaging strategy, and 3D image processing and analysis. We demonstrate successful application of the workflow to three studies, each aiming to better understand complex and dynamic biological processes, including bacterial and viral infections of cultured cells and formation of entotic cell-in-cell structures commonly observed in tumours. Our workflow revealed new insight into the replicative niche of Mycobacterium tuberculosis in primary human lymphatic endothelial cells, HIV-1 in human monocyte-derived macrophages, and the composition of the entotic vacuole. The broad application of this 3D CLEM technique will make it a useful addition to the correlative imaging toolbox for biomedical research. PMID:27445312
Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

PubMed

Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

2016-01-30

Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Leveraging workflow control patterns in the domain of clinical practice guidelines.

PubMed

Kaiser, Katharina; Marcos, Mar

2016-02-10

Clinical practice guidelines (CPGs) include recommendations describing appropriate care for the management of patients with a specific clinical condition. A number of representation languages have been developed to support executable CPGs, with associated authoring/editing tools. Even with tool assistance, authoring of CPG models is a labor-intensive task. We aim at facilitating the early stages of CPG modeling task. In this context, we propose to support the authoring of CPG models based on a set of suitable procedural patterns described in an implementation-independent notation that can be then semi-automatically transformed into one of the alternative executable CPG languages. We have started with the workflow control patterns which have been identified in the fields of workflow systems and business process management. We have analyzed the suitability of these patterns by means of a qualitative analysis of CPG texts. Following our analysis we have implemented a selection of workflow patterns in the Asbru and PROforma CPG languages. As implementation-independent notation for the description of patterns we have chosen BPMN 2.0. Finally, we have developed XSLT transformations to convert the BPMN 2.0 version of the patterns into the Asbru and PROforma languages. We showed that although a significant number of workflow control patterns are suitable to describe CPG procedural knowledge, not all of them are applicable in the context of CPGs due to their focus on single-patient care. Moreover, CPGs may require additional patterns not included in the set of workflow control patterns. We also showed that nearly all the CPG-suitable patterns can be conveniently implemented in the Asbru and PROforma languages. Finally, we demonstrated that individual patterns can be semi-automatically transformed from a process specification in BPMN 2.0 to executable implementations in these languages. We propose a pattern and transformation-based approach for the development of CPG models. Such an approach can form the basis of a valid framework for the authoring of CPG models. The identification of adequate patterns and the implementation of transformations to convert patterns from a process specification into different executable implementations are the first necessary steps for our approach.
From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

PubMed Central

Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta

2015-01-01

Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. Contact: philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk. PMID:26154165
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

PubMed Central

2011-01-01

Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework. PMID:21806842

Towards seamless workflows in agile data science

NASA Astrophysics Data System (ADS)

Klump, J. F.; Robertson, J.

2017-12-01

Agile workflows are a response to projects with requirements that may change over time. They prioritise rapid and flexible responses to change, preferring to adapt to changes in requirements rather than predict them before a project starts. This suits the needs of research very well because research is inherently agile in its methodology. The adoption of agile methods has made collaborative data analysis much easier in a research environment fragmented across institutional data stores, HPC, personal and lab computers and more recently cloud environments. Agile workflows use tools that share a common worldview: in an agile environment, there may be more that one valid version of data, code or environment in play at any given time. All of these versions need references and identifiers. For example, a team of developers following the git-flow conventions (github.com/nvie/gitflow) may have several active branches, one for each strand of development. These workflows allow rapid and parallel iteration while maintaining identifiers pointing to individual snapshots of data and code and allowing rapid switching between strands. In contrast, the current focus of versioning in research data management is geared towards managing data for reproducibility and long-term preservation of the record of science. While both are important goals in the persistent curation domain of the institutional research data infrastructure, current tools emphasise planning over adaptation and can introduce unwanted rigidity by insisting on a single valid version or point of truth. In the collaborative curation domain of a research project, things are more fluid. However, there is no equivalent to the "versioning iso-surface" of the git protocol for the management and versioning of research data. At CSIRO we are developing concepts and tools for the agile management of software code and research data for virtual research environments, based on our experiences of actual data analytics projects in the geosciences. We use code management that allows researchers to interact with the code through tools like Jupyter Notebooks while data are held in an object store. Our aim is an architecture allowing seamless integration of code development, data management, and data processing in virtual research environments.
The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.

PubMed

Katayama, Toshiaki; Wilkinson, Mark D; Vos, Rutger; Kawashima, Takeshi; Kawashima, Shuichi; Nakao, Mitsuteru; Yamamoto, Yasunori; Chun, Hong-Woo; Yamaguchi, Atsuko; Kawano, Shin; Aerts, Jan; Aoki-Kinoshita, Kiyoko F; Arakawa, Kazuharu; Aranda, Bruno; Bonnal, Raoul Jp; Fernández, José M; Fujisawa, Takatomo; Gordon, Paul Mk; Goto, Naohisa; Haider, Syed; Harris, Todd; Hatakeyama, Takashi; Ho, Isaac; Itoh, Masumi; Kasprzyk, Arek; Kido, Nobuhiro; Kim, Young-Joo; Kinjo, Akira R; Konishi, Fumikazu; Kovarskaya, Yulia; von Kuster, Greg; Labarga, Alberto; Limviphuvadh, Vachiranee; McCarthy, Luke; Nakamura, Yasukazu; Nam, Yunsun; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Oinn, Tom; Okamoto, Shinobu; Okuda, Shujiro; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Putnam, Nicholas; Senger, Martin; Severin, Jessica; Shigemoto, Yasumasa; Sugawara, Hideaki; Taylor, James; Trelles, Oswaldo; Yamasaki, Chisato; Yamashita, Riu; Satoh, Noriyuki; Takagi, Toshihisa

2011-08-02

The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
ResearchMatch: A National Registry to Recruit Volunteers for Clinical Research

PubMed Central

Harris, Paul A.; Scott, Kirstin W; Lebo, Laurie; Hassan, NikNik; Lighter, Chad; Pulley, Jill

2013-01-01

The authors designed ResearchMatch, a disease-neutral, web-based recruitment registry to help match individuals who wish to participate in clinical research studies with researchers actively searching for volunteers throughout the United States. In this article, they describe ResearchMatch’s stakeholders, workflow model, technical infrastructure, and, for the registry’s first 19 months of operation, utilization metrics. Having launched volunteer registration tools in November 2009 and researcher registration tools in March 2010, ResearchMatch had, as of June 2011, registered 15,871 volunteer participants from all 50 states. The registry was created as a collaborative project for institutions in the Clinical and Translational Science Awards (CTSA) consortium. Also as of June 2011, a total of 751 researchers from 61 participating CTSA institutions had registered to use the tool to recruit participants into 540 active studies and trials. ResearchMatch has proven successful in connecting volunteers with researchers, and the authors are currently evaluating regulatory and workflow options to open access to researchers at non-CTSA institutions. PMID:22104055
Pervasive access to images and data--the use of computing grids and mobile/wireless devices across healthcare enterprises.

PubMed

Pohjonen, Hanna; Ross, Peeter; Blickman, Johan G; Kamman, Richard

2007-01-01

Emerging technologies are transforming the workflows in healthcare enterprises. Computing grids and handheld mobile/wireless devices are providing clinicians with enterprise-wide access to all patient data and analysis tools on a pervasive basis. In this paper, emerging technologies are presented that provide computing grids and streaming-based access to image and data management functions, and system architectures that enable pervasive computing on a cost-effective basis. Finally, the implications of such technologies are investigated regarding the positive impacts on clinical workflows.
A high throughput geocomputing system for remote sensing quantitative retrieval and a case study

NASA Astrophysics Data System (ADS)

Xue, Yong; Chen, Ziqiang; Xu, Hui; Ai, Jianwen; Jiang, Shuzheng; Li, Yingjie; Wang, Ying; Guang, Jie; Mei, Linlu; Jiao, Xijuan; He, Xingwei; Hou, Tingting

2011-12-01

The quality and accuracy of remote sensing instruments have been improved significantly, however, rapid processing of large-scale remote sensing data becomes the bottleneck for remote sensing quantitative retrieval applications. The remote sensing quantitative retrieval is a data-intensive computation application, which is one of the research issues of high throughput computation. The remote sensing quantitative retrieval Grid workflow is a high-level core component of remote sensing Grid, which is used to support the modeling, reconstruction and implementation of large-scale complex applications of remote sensing science. In this paper, we intend to study middleware components of the remote sensing Grid - the dynamic Grid workflow based on the remote sensing quantitative retrieval application on Grid platform. We designed a novel architecture for the remote sensing Grid workflow. According to this architecture, we constructed the Remote Sensing Information Service Grid Node (RSSN) with Condor. We developed a graphic user interface (GUI) tools to compose remote sensing processing Grid workflows, and took the aerosol optical depth (AOD) retrieval as an example. The case study showed that significant improvement in the system performance could be achieved with this implementation. The results also give a perspective on the potential of applying Grid workflow practices to remote sensing quantitative retrieval problems using commodity class PCs.
An architecture model for multiple disease management information systems.

PubMed

Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei

2013-04-01

Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

PubMed

Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

2015-01-01

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Integrating text mining into the MGI biocuration workflow

PubMed Central

Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.

2009-01-01

A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492
Integrating text mining into the MGI biocuration workflow.

PubMed

Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A

2009-01-01

A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
Towards better digital pathology workflows: programming libraries for high-speed sharpness assessment of Whole Slide Images.

PubMed

Ameisen, David; Deroulers, Christophe; Perrier, Valérie; Bouhidel, Fatiha; Battistella, Maxime; Legrès, Luc; Janin, Anne; Bertheau, Philippe; Yunès, Jean-Baptiste

2014-01-01

Since microscopic slides can now be automatically digitized and integrated in the clinical workflow, quality assessment of Whole Slide Images (WSI) has become a crucial issue. We present a no-reference quality assessment method that has been thoroughly tested since 2010 and is under implementation in multiple sites, both public university-hospitals and private entities. It is part of the FlexMIm R&D project which aims to improve the global workflow of digital pathology. For these uses, we have developed two programming libraries, in Java and Python, which can be integrated in various types of WSI acquisition systems, viewers and image analysis tools. Development and testing have been carried out on a MacBook Pro i7 and on a bi-Xeon 2.7GHz server. Libraries implementing the blur assessment method have been developed in Java, Python, PHP5 and MySQL5. For web applications, JavaScript, Ajax, JSON and Sockets were also used, as well as the Google Maps API. Aperio SVS files were converted into the Google Maps format using VIPS and Openslide libraries. We designed the Java library as a Service Provider Interface (SPI), extendable by third parties. Analysis is computed in real-time (3 billion pixels per minute). Tests were made on 5000 single images, 200 NDPI WSI, 100 Aperio SVS WSI converted to the Google Maps format. Applications based on our method and libraries can be used upstream, as calibration and quality control tool for the WSI acquisition systems, or as tools to reacquire tiles while the WSI is being scanned. They can also be used downstream to reacquire the complete slides that are below the quality threshold for surgical pathology analysis. WSI may also be displayed in a smarter way by sending and displaying the regions of highest quality before other regions. Such quality assessment scores could be integrated as WSI's metadata shared in clinical, research or teaching contexts, for a more efficient medical informatics workflow.
Enabling Smart Workflows over Heterogeneous ID-Sensing Technologies

PubMed Central

Giner, Pau; Cetina, Carlos; Lacuesta, Raquel; Palacios, Guillermo

2012-01-01

Sensing technologies in mobile devices play a key role in reducing the gap between the physical and the digital world. The use of automatic identification capabilities can improve user participation in business processes where physical elements are involved (Smart Workflows). However, identifying all objects in the user surroundings does not automatically translate into meaningful services to the user. This work introduces Parkour, an architecture that allows the development of services that match the goals of each of the participants in a smart workflow. Parkour is based on a pluggable architecture that can be extended to provide support for new tasks and technologies. In order to facilitate the development of these plug-ins, tools that automate the development process are also provided. Several Parkour-based systems have been developed in order to validate the applicability of the proposal. PMID:23202193
A patient workflow management system built on guidelines.

PubMed Central

Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.

1997-01-01

To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606
A service oriented approach for guidelines-based clinical decision support using BPMN.

PubMed

Rodriguez-Loya, Salvador; Aziz, Ayesha; Chatwin, Chris

2014-01-01

Evidence-based medical practice requires that clinical guidelines need to be documented in such a way that they represent a clinical workflow in its most accessible form. In order to optimize clinical processes to improve clinical outcomes, we propose a Service Oriented Architecture (SOA) based approach for implementing clinical guidelines that can be accessed from an Electronic Health Record (EHR) application with a Web Services enabled communication mechanism with the Enterprise Service Bus. We have used Business Process Modelling Notation (BPMN) for modelling and presenting the clinical pathway in the form of a workflow. The aim of this study is to produce spontaneous alerts in the healthcare workflow in the diagnosis of Chronic Obstructive Pulmonary Disease (COPD). The use of BPMN as a tool to automate clinical guidelines has not been previously employed for providing Clinical Decision Support (CDS).
Rapid resistome mapping using nanopore sequencing.

PubMed

van der Helm, Eric; Imamovic, Lejla; Hashim Ellabaan, Mostafa M; van Schaik, Willem; Koza, Anna; Sommer, Morten O A

2017-05-05

The emergence of antibiotic resistance in human pathogens has become a major threat to modern medicine. The outcome of antibiotic treatment can be affected by the composition of the gut. Accordingly, knowledge of the gut resistome composition could enable more effective and individualized treatment of bacterial infections. Yet, rapid workflows for resistome characterization are lacking. To address this challenge we developed the poreFUME workflow that deploys functional metagenomic selections and nanopore sequencing to resistome mapping. We demonstrate the approach by functionally characterizing the gut resistome of an ICU (intensive care unit) patient. The accuracy of the poreFUME pipeline is with >97% sufficient for the annotation of antibiotic resistance genes. The poreFUME pipeline provides a promising approach for efficient resistome profiling that could inform antibiotic treatment decisions in the future. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Describing and Modeling Workflow and Information Flow in Chronic Disease Care

PubMed Central

Unertl, Kim M.; Weinger, Matthew B.; Johnson, Kevin B.; Lorenzi, Nancy M.

2009-01-01

Objectives The goal of the study was to develop an in-depth understanding of work practices, workflow, and information flow in chronic disease care, to facilitate development of context-appropriate informatics tools. Design The study was conducted over a 10-month period in three ambulatory clinics providing chronic disease care. The authors iteratively collected data using direct observation and semi-structured interviews. Measurements The authors observed all aspects of care in three different chronic disease clinics for over 150 hours, including 157 patient-provider interactions. Observation focused on interactions among people, processes, and technology. Observation data were analyzed through an open coding approach. The authors then developed models of workflow and information flow using Hierarchical Task Analysis and Soft Systems Methodology. The authors also conducted nine semi-structured interviews to confirm and refine the models. Results The study had three primary outcomes: models of workflow for each clinic, models of information flow for each clinic, and an in-depth description of work practices and the role of health information technology (HIT) in the clinics. The authors identified gaps between the existing HIT functionality and the needs of chronic disease providers. Conclusions In response to the analysis of workflow and information flow, the authors developed ten guidelines for design of HIT to support chronic disease care, including recommendations to pursue modular approaches to design that would support disease-specific needs. The study demonstrates the importance of evaluating workflow and information flow in HIT design and implementation. PMID:19717802
A Mixed-Methods Research Framework for Healthcare Process Improvement.

PubMed

Bastian, Nathaniel D; Munoz, David; Ventura, Marta

2016-01-01

The healthcare system in the United States is spiraling out of control due to ever-increasing costs without significant improvements in quality, access to care, satisfaction, and efficiency. Efficient workflow is paramount to improving healthcare value while maintaining the utmost standards of patient care and provider satisfaction in high stress environments. This article provides healthcare managers and quality engineers with a practical healthcare process improvement framework to assess, measure and improve clinical workflow processes. The proposed mixed-methods research framework integrates qualitative and quantitative tools to foster the improvement of processes and workflow in a systematic way. The framework consists of three distinct phases: 1) stakeholder analysis, 2a) survey design, 2b) time-motion study, and 3) process improvement. The proposed framework is applied to the pediatric intensive care unit of the Penn State Hershey Children's Hospital. The implementation of this methodology led to identification and categorization of different workflow tasks and activities into both value-added and non-value added in an effort to provide more valuable and higher quality patient care. Based upon the lessons learned from the case study, the three-phase methodology provides a better, broader, leaner, and holistic assessment of clinical workflow. The proposed framework can be implemented in various healthcare settings to support continuous improvement efforts in which complexity is a daily element that impacts workflow. We proffer a general methodology for process improvement in a healthcare setting, providing decision makers and stakeholders with a useful framework to help their organizations improve efficiency. Published by Elsevier Inc.
Purdue ionomics information management system. An integrated functional genomics platform.

PubMed

Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E

2007-02-01

The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Mobile task management tool that improves workflow of an acute general surgical service.

PubMed

Foo, Elizabeth; McDonald, Rod; Savage, Earle; Floyd, Richard; Butler, Anthony; Rumball-Smith, Alistair; Connor, Saxon

2015-10-01

Understanding and being able to measure constraints within a health system is crucial if outcomes are to be improved. Current systems lack the ability to capture decision making with regard to tasks performed within a patient journey. The aim of this study was to assess the impact of a mobile task management tool on clinical workflow within an acute general surgical service by analysing data capture and usability of the application tool. The Cortex iOS application was developed to digitize patient flow and provide real-time visibility over clinical decision making and task performance. Study outcomes measured were workflow data capture for patient and staff events. Usability was assessed using an electronic survey. There were 449 unique patient journeys tracked with a total of 3072 patient events recorded. The results repository was accessed 7792 times. The participants reported that the application sped up decision making, reduced redundancy of work and improved team communication. The mode of the estimated time the application saved participants was 5-9 min/h of work. Of the 14 respondents, nine discarded their analogue methods of tracking tasks by the end of the study period. The introduction of a mobile task management system improved the working efficiency of junior clinical staff. The application allowed capture of data not previously available to hospital systems. In the future, such data will contribute to the accurate mapping of patient journeys through the health system. © 2015 Royal Australasian College of Surgeons.
Workflow and web application for annotating NCBI BioProject transcriptome data

PubMed Central

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David

2017-01-01

Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765
SYRMEP Tomo Project: a graphical user interface for customizing CT reconstruction workflows.

PubMed

Brun, Francesco; Massimi, Lorenzo; Fratini, Michela; Dreossi, Diego; Billé, Fulvio; Accardo, Agostino; Pugliese, Roberto; Cedola, Alessia

2017-01-01

When considering the acquisition of experimental synchrotron radiation (SR) X-ray CT data, the reconstruction workflow cannot be limited to the essential computational steps of flat fielding and filtered back projection (FBP). More refined image processing is often required, usually to compensate artifacts and enhance the quality of the reconstructed images. In principle, it would be desirable to optimize the reconstruction workflow at the facility during the experiment (beamtime). However, several practical factors affect the image reconstruction part of the experiment and users are likely to conclude the beamtime with sub-optimal reconstructed images. Through an example of application, this article presents SYRMEP Tomo Project (STP), an open-source software tool conceived to let users design custom CT reconstruction workflows. STP has been designed for post-beamtime (off-line use) and for a new reconstruction of past archived data at user's home institution where simple computing resources are available. Releases of the software can be downloaded at the Elettra Scientific Computing group GitHub repository https://github.com/ElettraSciComp/STP-Gui.

Schedule-Aware Workflow Management Systems

NASA Astrophysics Data System (ADS)

Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.

Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.
LACO-Wiki: A land cover validation tool and a new, innovative teaching resource for remote sensing and the geosciences

NASA Astrophysics Data System (ADS)

See, Linda; Perger, Christoph; Dresel, Christopher; Hofer, Martin; Weichselbaum, Juergen; Mondel, Thomas; Steffen, Fritz

2016-04-01

The validation of land cover products is an important step in the workflow of generating a land cover map from remotely-sensed imagery. Many students of remote sensing will be given exercises on classifying a land cover map followed by the validation process. Many algorithms exist for classification, embedded within proprietary image processing software or increasingly as open source tools. However, there is little standardization for land cover validation, nor a set of open tools available for implementing this process. The LACO-Wiki tool was developed as a way of filling this gap, bringing together standardized land cover validation methods and workflows into a single portal. This includes the storage and management of land cover maps and validation data; step-by-step instructions to guide users through the validation process; sound sampling designs; an easy-to-use environment for validation sample interpretation; and the generation of accuracy reports based on the validation process. The tool was developed for a range of users including producers of land cover maps, researchers, teachers and students. The use of such a tool could be embedded within the curriculum of remote sensing courses at a university level but is simple enough for use by students aged 13-18. A beta version of the tool is available for testing at: http://www.laco-wiki.net.
Virtual Sensor Web Architecture

NASA Astrophysics Data System (ADS)

Bose, P.; Zimdars, A.; Hurlburt, N.; Doug, S.

2006-12-01

NASA envisions the development of smart sensor webs, intelligent and integrated observation network that harness distributed sensing assets, their associated continuous and complex data sets, and predictive observation processing mechanisms for timely, collaborative hazard mitigation and enhanced science productivity and reliability. This paper presents Virtual Sensor Web Infrastructure for Collaborative Science (VSICS) Architecture for sustained coordination of (numerical and distributed) model-based processing, closed-loop resource allocation, and observation planning. VSICS's key ideas include i) rich descriptions of sensors as services based on semantic markup languages like OWL and SensorML; ii) service-oriented workflow composition and repair for simple and ensemble models; event-driven workflow execution based on event-based and distributed workflow management mechanisms; and iii) development of autonomous model interaction management capabilities providing closed-loop control of collection resources driven by competing targeted observation needs. We present results from initial work on collaborative science processing involving distributed services (COSEC framework) that is being extended to create VSICS.
Policy Driven Development: Flexible Policy Insertion for Large Scale Systems.

PubMed

Demchak, Barry; Krüger, Ingolf

2012-07-01

The success of a software system depends critically on how well it reflects and adapts to stakeholder requirements. Traditional development methods often frustrate stakeholders by creating long latencies between requirement articulation and system deployment, especially in large scale systems. One source of latency is the maintenance of policy decisions encoded directly into system workflows at development time, including those involving access control and feature set selection. We created the Policy Driven Development (PDD) methodology to address these development latencies by enabling the flexible injection of decision points into existing workflows at runtime , thus enabling policy composition that integrates requirements furnished by multiple, oblivious stakeholder groups. Using PDD, we designed and implemented a production cyberinfrastructure that demonstrates policy and workflow injection that quickly implements stakeholder requirements, including features not contemplated in the original system design. PDD provides a path to quickly and cost effectively evolve such applications over a long lifetime.
A proof of concept for epidemiological research using structured reporting with pulmonary embolism as a use case.

PubMed

Daniel, Pinto Dos Santos; Sonja, Scheibl; Gordon, Arnhold; Aline, Maehringer-Kunz; Christoph, Düber; Peter, Mildenberger; Roman, Kloeckner

2018-05-10

This paper studies the possibilities of an integrated IT-based workflow for epidemiological research in pulmonary embolism using freely available tools and structured reporting. We included a total of 521 consecutive cases which had been referred to the radiology department for computed tomography pulmonary angiography (CTPA) with suspected pulmonary embolism (PE). Free-text reports were transformed into structured reports using a freely available IHE-MRRT-compliant reporting platform. D-dimer values were retrieved from the hospitals laboratory results system. All information was stored in the platform's database and visualized using freely available tools. For further analysis, we directly accessed the platform's database with an advanced analytics tool (RapidMiner). We were able developed an integrated workflow for epidemiological statistics from reports obtained in clinical routine. The report data allowed for automated calculation of epidemiological parameters. Prevalence of pulmonary embolism was 27.6%. The mean age in patients with and without pulmonary embolism did not differ (62.8 years and 62.0 years, respectively, p=0.987). As expected, there was a significant difference in mean D-dimer values (10.13 mg/L FEU and 3.12 mg/L FEU, respectively, p<0.001). Structured reporting can make data obtained from clinical routine more accessible. Designing practical workflows is feasible using freely available tools and allows for the calculation of epidemiological statistics on a near real-time basis. Therefore, radiologists should push for the implementation of structured reporting in clinical routine. Advances in knowledge: Theoretical benefits of structured reporting have long been discussed, but practical implementation demonstrating those benefits has been lacking. Here we present a first experience providing proof that structured reporting will make data from clinical routine more accessible.
Applying the Karma Provenance tool to NASA's AMSR-E Data Production Stream

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Conover, H.; Regner, K.; Movva, S.; Goodman, H. M.; Pale, B.; Purohit, P.; Sun, Y.

2010-12-01

Current procedures for capturing and disseminating provenance, or data product lineage, are limited in both what is captured and how it is disseminated to the science community. For example, the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) Science Investigator-led Processing System (SIPS) generates Level 2 and Level 3 data products for a variety of geophysical parameters. Data provenance and quality information for these data sets is either very general (e.g., user guides, a list of anomalous data receipt and processing conditions over the life of the missions) or difficult to access or interpret (e.g., quality flags embedded in the data, production history files not easily available to users). Karma is a provenance collection and representation tool designed and developed for data driven workflows such as the productions streams used to produce EOS standard products. Karma records uniform and usable provenance metadata independent of the processing system while minimizing both the modification burden on the processing system and the overall performance overhead. Karma collects both the process and data provenance. The process provenance contains information about the workflow execution and the associated algorithm invocations. The data provenance captures metadata about the derivation history of the data product, including algorithms used and input data sources transformed to generate it. As part of an ongoing NASA funded project, Karma is being integrated into the AMSR-E SIPS data production streams. Metadata gathered by the tool will be presented to the data consumers as provenance graphs, which are useful in validating the workflows and determining the quality of the data product. This presentation will discuss design and implementation issues faced while incorporating a provenance tool into a structured data production flow. Prototype results will also be presented in this talk.
JTSA: an open source framework for time series abstractions.

PubMed

Sacchi, Lucia; Capozzi, Davide; Bellazzi, Riccardo; Larizza, Cristiana

2015-10-01

The evaluation of the clinical status of a patient is frequently based on the temporal evolution of some parameters, making the detection of temporal patterns a priority in data analysis. Temporal abstraction (TA) is a methodology widely used in medical reasoning for summarizing and abstracting longitudinal data. This paper describes JTSA (Java Time Series Abstractor), a framework including a library of algorithms for time series preprocessing and abstraction and an engine to execute a workflow for temporal data processing. The JTSA framework is grounded on a comprehensive ontology that models temporal data processing both from the data storage and the abstraction computation perspective. The JTSA framework is designed to allow users to build their own analysis workflows by combining different algorithms. Thanks to the modular structure of a workflow, simple to highly complex patterns can be detected. The JTSA framework has been developed in Java 1.7 and is distributed under GPL as a jar file. JTSA provides: a collection of algorithms to perform temporal abstraction and preprocessing of time series, a framework for defining and executing data analysis workflows based on these algorithms, and a GUI for workflow prototyping and testing. The whole JTSA project relies on a formal model of the data types and of the algorithms included in the library. This model is the basis for the design and implementation of the software application. Taking into account this formalized structure, the user can easily extend the JTSA framework by adding new algorithms. Results are shown in the context of the EU project MOSAIC to extract relevant patterns from data coming related to the long term monitoring of diabetic patients. The proof that JTSA is a versatile tool to be adapted to different needs is given by its possible uses, both as a standalone tool for data summarization and as a module to be embedded into other architectures to select specific phenotypes based on TAs in a large dataset. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Developing a Collection of Composable Data Translation Software Units to Improve Efficiency and Reproducibility in Ecohydrologic Modeling Workflows

NASA Astrophysics Data System (ADS)

Olschanowsky, C.; Flores, A. N.; FitzGerald, K.; Masarik, M. T.; Rudisill, W. J.; Aguayo, M.

2017-12-01

Dynamic models of the spatiotemporal evolution of water, energy, and nutrient cycling are important tools to assess impacts of climate and other environmental changes on ecohydrologic systems. These models require spatiotemporally varying environmental forcings like precipitation, temperature, humidity, windspeed, and solar radiation. These input data originate from a variety of sources, including global and regional weather and climate models, global and regional reanalysis products, and geostatistically interpolated surface observations. Data translation measures, often subsetting in space and/or time and transforming and converting variable units, represent a seemingly mundane, but critical step in the application workflows. Translation steps can introduce errors, misrepresentations of data, slow execution time, and interrupt data provenance. We leverage a workflow that subsets a large regional dataset derived from the Weather Research and Forecasting (WRF) model and prepares inputs to the Parflow integrated hydrologic model to demonstrate the impact translation tool software quality on scientific workflow results and performance. We propose that such workflows will benefit from a community approved collection of data transformation components. The components should be self-contained composable units of code. This design pattern enables automated parallelization and software verification, improving performance and reliability. Ensuring that individual translation components are self-contained and target minute tasks increases reliability. The small code size of each component enables effective unit and regression testing. The components can be automatically composed for efficient execution. An efficient data translation framework should be written to minimize data movement. Composing components within a single streaming process reduces data movement. Each component will typically have a low arithmetic intensity, meaning that it requires about the same number of bytes to be read as the number of computations it performs. When several components' executions are coordinated the overall arithmetic intensity increases, leading to increased efficiency.
Closha: bioinformatics workflow system for the analysis of massive sequencing data.

PubMed

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

2018-02-19

While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
Facilitating hydrological data analysis workflows in R: the RHydro package

NASA Astrophysics Data System (ADS)

Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik

2015-04-01

The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges of the RHydro package, including integration with big data technologies, web technologies, and emerging data models in hydrology.
A knowledge-based decision support system in bioinformatics: an application to protein complex extraction

PubMed Central

2013-01-01

Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results. PMID:23368995
Creating a comprehensive customer service program to help convey critical and acute results of radiology studies.

PubMed

Towbin, Alexander J; Hall, Seth; Moskovitz, Jay; Johnson, Neil D; Donnelly, Lane F

2011-01-01

Communication of acute or critical results between the radiology department and referring clinicians has been a deficiency of many radiology departments. The failure to perform or document these communications can lead to poor patient care, patient safety issues, medical-legal issues, and complaints from referring clinicians. To mitigate these factors, a communication and documentation tool was created and incorporated into our departmental customer service program. This article will describe the implementation of a comprehensive customer service program in a hospital-based radiology department. A comprehensive customer service program was created in the radiology department. Customer service representatives were hired to answer the telephone calls to the radiology reading rooms and to help convey radiology results. The radiologists, referring clinicians, and customer service representatives were then linked via a novel workflow management system. This workflow management system provided tools to help facilitate the communication needs of each group. The number of studies with results conveyed was recorded from the implementation of the workflow management system. Between the implementation of the workflow management system on August 1, 2005, and June 1, 2009, 116,844 radiology results were conveyed to the referring clinicians and documented in the system. This accounts for more than 14% of the 828,516 radiology cases performed in this time frame. We have been successful in creating a comprehensive customer service program to convey and document communication of radiology results. This program has been widely used by the ordering clinicians as well as radiologists since its inception.
MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

PubMed Central

2014-01-01

Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103
Towards a Unified Architecture for Data-Intensive Seismology in VERCE

NASA Astrophysics Data System (ADS)

Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.

2013-12-01

Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
Methylation Integration (Mint) | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

A comprehensive software pipeline and set of Galaxy tools/workflows for integrative analysis of genome-wide DNA methylation and hydroxymethylation data. Data types can be either bisulfite sequencing and/or pull-down methods.
Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

PubMed

Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

PubMed Central

Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
An Automated Ab Initio Framework for Identifying New Ferroelectrics

NASA Astrophysics Data System (ADS)

Smidt, Tess; Reyes-Lillo, Sebastian E.; Jain, Anubhav; Neaton, Jeffrey B.

Ferroelectric materials have a wide-range of technological applications including non-volatile RAM and optoelectronics. In this work, we present an automated first-principles search for ferroelectrics. We integrate density functional theory, crystal structure databases, symmetry tools, workflow software, and a custom analysis toolkit to build a library of known and proposed ferroelectrics. We screen thousands of candidates using symmetry relations between nonpolar and polar structure pairs. We use two search strategies 1) polar-nonpolar pairs with the same composition and 2) polar-nonpolar structure type pairs. Results are automatically parsed, stored in a database, and accessible via a web interface showing distortion animations and plots of polarization and total energy as a function of distortion. We benchmark our results against experimental data, present new ferroelectric candidates found through our search, and discuss future work on expanding this search methodology to other material classes such as anti-ferroelectrics and multiferroics.
The Importance of Experimental Design, Quality Assurance, and Control in Plant Metabolomics Experiments.

PubMed

Martins, Marina C M; Caldana, Camila; Wolf, Lucia Daniela; de Abreu, Luis Guilherme Furlan

2018-01-01

The output of metabolomics relies to a great extent upon the methods and instrumentation to identify, quantify, and access spatial information on as many metabolites as possible. However, the most modern machines and sophisticated tools for data analysis cannot compensate for inappropriate harvesting and/or sample preparation procedures that modify metabolic composition and can lead to erroneous interpretation of results. In addition, plant metabolism has a remarkable degree of complexity, and the number of identified compounds easily surpasses the number of samples in metabolomics analyses, increasing false discovery risk. These aspects pose a large challenge when carrying out plant metabolomics experiments. In this chapter, we address the importance of a proper experimental design taking into consideration preventable complications and unavoidable factors to achieve success in metabolomics analysis. We also focus on quality control and standardized procedures during the metabolomics workflow.
moocRP: Enabling Open Learning Analytics with an Open Source Platform for Data Distribution, Analysis, and Visualization

ERIC Educational Resources Information Center

Pardos, Zachary A.; Whyte, Anthony; Kao, Kevin

2016-01-01

In this paper, we address issues of transparency, modularity, and privacy with the introduction of an open source, web-based data repository and analysis tool tailored to the Massive Open Online Course community. The tool integrates data request/authorization and distribution workflow features as well as provides a simple analytics module upload…

HEPDOOP: High-Energy Physics Analysis using Hadoop

NASA Astrophysics Data System (ADS)

Bhimji, W.; Bristow, T.; Washbrook, A.

2014-06-01

We perform a LHC data analysis workflow using tools and data formats that are commonly used in the "Big Data" community outside High Energy Physics (HEP). These include Apache Avro for serialisation to binary files, Pig and Hadoop for mass data processing and Python Scikit-Learn for multi-variate analysis. Comparison is made with the same analysis performed with current HEP tools in ROOT.
ASaiM: a Galaxy-based framework to analyze microbiota data.

PubMed

Batut, Bérénice; Gravouil, Kévin; Defois, Clémence; Hiltemann, Saskia; Brugère, Jean-François; Peyretaillade, Eric; Peyret, Pierre

2018-05-22

New generations of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore and visualize microbiota information from raw metataxonomic, metagenomic or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets, but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible and shareable.
A PDA study management tool (SMT) utilizing wireless broadband and full DICOM viewing capability

NASA Astrophysics Data System (ADS)

Documet, Jorge; Liu, Brent; Zhou, Zheng; Huang, H. K.; Documet, Luis

2007-03-01

During the last 4 years IPI (Image Processing and Informatics) Laboratory has been developing a web-based Study Management Tool (SMT) application that allows Radiologists, Film librarians and PACS-related (Picture Archiving and Communication System) users to dynamically and remotely perform Query/Retrieve operations in a PACS network. The users utilizing a regular PDA (Personal Digital Assistant) can remotely query a PACS archive to distribute any study to an existing DICOM (Digital Imaging and Communications in Medicine) node. This application which has proven to be convenient to manage the Study Workflow [1, 2] has been extended to include a DICOM viewing capability in the PDA. With this new feature, users can take a quick view of DICOM images providing them mobility and convenience at the same time. In addition, we are extending this application to Metropolitan-Area Wireless Broadband Networks. This feature requires Smart Phones that are capable of working as a PDA and have access to Broadband Wireless Services. With the extended application to wireless broadband technology and the preview of DICOM images, the Study Management Tool becomes an even more powerful tool for clinical workflow management.
OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows

PubMed Central

2013-01-01

Background Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses. PMID:23286517
Build and Execute Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guan, Qiang

At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows willmore » compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.« less
Acceptance of a Mobile Application Supporting Nurses Workflow at Patient Bedside: Results from a Pilot Study.

PubMed

Ehrler, Frederic; Ducloux, Pascal; Wu, Danny T Y; Lovis, Christian; Blondon, Katherine

2018-01-01

Supporting caregivers' workflow with mobile applications (apps) is a growing trend. At the bedside, apps can provide new ways to support the documentation process rather than using a desktop computer in a nursing office. Although these applications show potential, few existing reports have studied the real impact of such solutions. At the University Hospitals of Geneva, we developed BEDside Mobility, a mobile application supporting nurses' daily workflow. In a pilot study, the app was trialed in two wards for a period of one month. We collected data of the actual usage of the app and asked the users to complete a tailored technology acceptance model questionnaire at the end of the study period. Results show that participation remain stable with time with participants using in average the tool for almost 29 minutes per day. The technology acceptance questionnaires revealed a high usability of the app and good promotion from the institution although users did not perceive any increase in productivity. Overall, intent of use was divergent between promoters and antagonist. Furthermore, some participants considered the tool as an addition to their workload. This evaluation underlines the importance of helping all end users perceive the benefits of a new intervention since coworkers strong influence each other.
DNAseq Workflow in a Diagnostic Context and an Example of a User Friendly Implementation.

PubMed

Wolf, Beat; Kuonen, Pierre; Dandekar, Thomas; Atlan, David

2015-01-01

Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.
Development of a tethered personal health record framework for early end-of-life discussions.

PubMed

Bose-Brill, Seuli; Kretovics, Matthew; Ballenger, Taylor; Modan, Gabriella; Lai, Albert; Belanger, Lindsay; Koesters, Stephen; Pressler-Vydra, Taylor; Wills, Celia

2016-06-01

End-of-life planning, known as advance care planning (ACP), is associated with numerous positive outcomes, such as improved patient satisfaction with care and improved patient quality of life in terminal illness. However, patient-provider ACP conversations are rarely performed or documented due to a number of barriers, including time required, perceived lack of skill, and a limited number of resources. Use of tethered personal health records (PHRs) may help streamline ACP conversations and documentations for outpatient workflows. Our objective was to develop an ACP-PHR framework that would be for use in a primary care, outpatient setting. Qualitative content analysis of focus groups and cognitive interviews (participatory design). A novel PHR-ACP tool was developed and tested using data and feedback collected from 4 patient focus groups (n = 13), 1 provider focus group (n = 4), and cognitive interviews (n = 22). Patient focus groups helped develop a focused, 4-question PHR communication tool. Cognitive interviews revealed that, while patients felt framework content and workflow were generally intuitive, minor changes to content and workflow would optimize the framework. A focused framework for electronic ACP communication using a patient portal tethered to the PHR was developed. This framework may provide an efficient way to have ACP conversations in busy outpatient settings.
Process Mining for Individualized Behavior Modeling Using Wireless Tracking in Nursing Homes

PubMed Central

Fernández-Llatas, Carlos; Benedi, José-Miguel; García-Gómez, Juan M.; Traver, Vicente

2013-01-01

The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection. PMID:24225907
HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

PubMed Central

Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

2016-01-01

Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564
Technology in the OR: AORN Members' Perceptions of the Effects on Workflow Efficiency and Quality Patient Care.

PubMed

Sipes, Carolyn; Baker, Joy Don

2015-09-01

This collaborative study sought to describe technology used by AORN members at work, inclusive of radio-frequency identification or barcode scanning (RFID), data collection tools (DATA), workflow or dashboard management tools (DASHBOARD), and environmental services/room decontamination technologies (ENVIRON), and to identify the perceived effects of each technology on workflow efficiency (WFE) and quality patient care (QPC). The 462 respondents to the AORN Technology in the OR survey reported use of technology (USE) in all categories. Eleven of 17 RFID items had a strong positive correlation between the designated USE item and the perceived effect on WFE and QPC. Five of the most-used technology items were found in the DATA category. Two of the five related to Intraoperative Nursing Documentation and the use of the Perioperative Nursing Data Set. The other three related to Imaging Integration for Radiology Equipment, Video Camera Systems, and Fiber-optic Systems. All three elements explored in the DASHBOARD category (ie, Patient Update, OR Case, OR Efficiency) demonstrated approximately 50% or greater perceived effectiveness in WFE and QPC. There was a low reported use of ENVIRON technologies, resulting in limited WFE and QPC data for this category. Copyright © 2015 AORN, Inc. Published by Elsevier Inc. All rights reserved.
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo

2011-07-20

Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Home care and technology: a case study.

PubMed

Stroulia, Eleni; Nikolaidisa, Ioanis; Liua, Lili; King, Sharla; Lessard, Lysanne

2012-01-01

Health care aides (HCAs) are the backbone of the home care system and provide a range of services to people who, for various reasons related to chronic conditions and aging, are not able to take care of themselves independently. The demand for HCA services will increase and the current HCA supply will likely not keep up with this increasing demand without fundamental changes in the current environment. Information and communication technology (ICT) can address some of the workflow challenges HCAs face. In this project, we conducted an ethnographic study to document and analyse HCAs' workflows and team interactions. Based on our findings, we designed an ICT tool suite, integrating easily available existing and newly developed (by our team) technologies to address these issues. Finally, we simulated the deployment of our technologies, to assess the potential impact of these technological solutions on the workflow and productivity of HCAs, their healthcare teams and client care.
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.

PubMed

Aranguren, Mikel Egaña; Wilkinson, Mark D

2015-01-01

Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services. This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration. The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
CONNJUR Workflow Builder: A software integration environment for spectral reconstruction

PubMed Central

Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O.; Ellis, Heidi J.C.; Gryk, Michael R.

2015-01-01

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses. PMID:26066803
Clinic Workflow Simulations using Secondary EHR Data

PubMed Central

Hribar, Michelle R.; Biermann, David; Read-Brown, Sarah; Reznick, Leah; Lombardi, Lorinna; Parikh, Mansi; Chamberlain, Winston; Yackel, Thomas R.; Chiang, Michael F.

2016-01-01

Clinicians today face increased patient loads, decreased reimbursements and potential negative productivity impacts of using electronic health records (EHR), but have little guidance on how to improve clinic efficiency. Discrete event simulation models are powerful tools for evaluating clinical workflow and improving efficiency, particularly when they are built from secondary EHR timing data. The purpose of this study is to demonstrate that these simulation models can be used for resource allocation decision making as well as for evaluating novel scheduling strategies in outpatient ophthalmology clinics. Key findings from this study are that: 1) secondary use of EHR timestamp data in simulation models represents clinic workflow, 2) simulations provide insight into the best allocation of resources in a clinic, 3) simulations provide critical information for schedule creation and decision making by clinic managers, and 4) simulation models built from EHR data are potentially generalizable. PMID:28269861
Jflow: a workflow management system for web applications.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

2016-02-01

Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

PubMed

Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

2017-11-10

RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Theoretical predictor for candidate structure assignment from IMS data of biomolecule-related conformational space.

PubMed

Schenk, Emily R; Nau, Frederic; Fernandez-Lima, Francisco

2015-06-01

The ability to correlate experimental ion mobility data with candidate structures from theoretical modeling provides a powerful analytical and structural tool for the characterization of biomolecules. In the present paper, a theoretical workflow is described to generate and assign candidate structures for experimental trapped ion mobility and H/D exchange (HDX-TIMS-MS) data following molecular dynamics simulations and statistical filtering. The applicability of the theoretical predictor is illustrated for a peptide and protein example with multiple conformations and kinetic intermediates. The described methodology yields a low computational cost and a simple workflow by incorporating statistical filtering and molecular dynamics simulations. The workflow can be adapted to different IMS scenarios and CCS calculators for a more accurate description of the IMS experimental conditions. For the case of the HDX-TIMS-MS experiments, molecular dynamics in the "TIMS box" accounts for a better sampling of the molecular intermediates and local energy minima.
Genetic analysis of circulating tumor cells in pancreatic cancer patients: A pilot study.

PubMed

Görner, Karin; Bachmann, Jeannine; Holzhauer, Claudia; Kirchner, Roland; Raba, Katharina; Fischer, Johannes C; Martignoni, Marc E; Schiemann, Matthias; Alunni-Fabbroni, Marianna

2015-07-01

Pancreatic cancer is one of the most aggressive malignant tumors, mainly due to an aggressive metastasis spreading. In recent years, circulating tumor cells became associated to tumor metastasis. Little is known about their expression profiles. The aim of this study was to develop a complete workflow making it possible to isolate circulating tumor cells from patients with pancreatic cancer and their genetic characterization. We show that the proposed workflow offers a technical sensitivity and specificity high enough to detect and isolate single tumor cells. Moreover our approach makes feasible to genetically characterize single CTCs. Our work discloses a complete workflow to detect, count and genetically analyze individual CTCs isolated from blood samples. This method has a central impact on the early detection of metastasis development. The combination of cell quantification and genetic analysis provides the clinicians with a powerful tool not available so far. Copyright © 2015. Published by Elsevier Inc.

CONNJUR Workflow Builder: a software integration environment for spectral reconstruction.

PubMed

Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O; Ellis, Heidi J C; Gryk, Michael R

2015-07-01

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses.
IceProd 2 Usage Experience

NASA Astrophysics Data System (ADS)

Delventhal, D.; Schultz, D.; Diaz Velez, J. C.

2017-10-01

IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and data driven analysis. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd has recently been rewritten to increase its scaling capabilities, handle user analysis workflows together with simulation production, and facilitate the integration with 3rd party scheduling tools. IceProd 2, the second generation of IceProd, has been running in production for several months now. We share our experience setting up the system and things we’ve learned along the way.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.

PubMed

Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

2017-03-01

Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

PubMed Central

Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

2017-01-01

Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945
Adaptive Workflows for Diabetes Management: Self-Management Assistant and Remote Treatment for Diabetes.

PubMed

Contreras, Iván; Kiefer, Stephan; Vehi, Josep

2017-01-01

Diabetes self-management is a crucial element for all people with diabetes and those at risk for developing the disease. Diabetic patients should be empowered to increase their self-management skills in order to prevent or delay the complications of diabetes. This work presents the proposal and first development stages of a smartphone application focused on the empowerment of the patients with diabetes. The concept of this interventional tool is based on the personalization of the user experience from an adaptive and dynamic perspective. The segmentation of the population and the dynamical treatment of user profiles among the different experience levels is the main challenge of the implementation. The self-management assistant and remote treatment for diabetes aims to develop a platform to integrate a series of innovative models and tools rigorously tested and supported by the research literature in diabetes together the use of a proved engine to manage workflows for healthcare.
Optimizing Perioperative Decision Making: Improved Information for Clinical Workflow Planning

PubMed Central

Doebbeling, Bradley N.; Burton, Matthew M.; Wiebke, Eric A.; Miller, Spencer; Baxter, Laurence; Miller, Donald; Alvarez, Jorge; Pekny, Joseph

2012-01-01

Perioperative care is complex and involves multiple interconnected subsystems. Delayed starts, prolonged cases and overtime are common. Surgical procedures account for 40–70% of hospital revenues and 30–40% of total costs. Most planning and scheduling in healthcare is done without modern planning tools, which have potential for improving access by assisting in operations planning support. We identified key planning scenarios of interest to perioperative leaders, in order to examine the feasibility of applying combinatorial optimization software solving some of those planning issues in the operative setting. Perioperative leaders desire a broad range of tools for planning and assessing alternate solutions. Our modeled solutions generated feasible solutions that varied as expected, based on resource and policy assumptions and found better utilization of scarce resources. Combinatorial optimization modeling can effectively evaluate alternatives to support key decisions for planning clinical workflow and improving care efficiency and satisfaction. PMID:23304284
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

DOE PAGES

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...

2015-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
Systematic Redaction for Neuroimage Data

PubMed Central

Matlock, Matt; Schimke, Nakeisha; Kong, Liang; Macke, Stephen; Hale, John

2013-01-01

In neuroscience, collaboration and data sharing are undermined by concerns over the management of protected health information (PHI) and personal identifying information (PII) in neuroimage datasets. The HIPAA Privacy Rule mandates measures for the preservation of subject privacy in neuroimaging studies. Unfortunately for the researcher, the management of information privacy is a burdensome task. Wide scale data sharing of neuroimages is challenging for three primary reasons: (i) A dearth of tools to systematically expunge PHI/PII from neuroimage data sets, (ii) a facility for tracking patient identities in redacted datasets has not been produced, and (iii) a sanitization workflow remains conspicuously absent. This article describes the XNAT Redaction Toolkit—an integrated redaction workflow which extends a popular neuroimage data management toolkit to remove PHI/PII from neuroimages. Quickshear defacing is also presented as a complementary technique for deidentifying the image data itself. Together, these tools improve subject privacy through systematic removal of PII/PHI. PMID:24179597
Optimizing perioperative decision making: improved information for clinical workflow planning.

PubMed

Doebbeling, Bradley N; Burton, Matthew M; Wiebke, Eric A; Miller, Spencer; Baxter, Laurence; Miller, Donald; Alvarez, Jorge; Pekny, Joseph

2012-01-01

Perioperative care is complex and involves multiple interconnected subsystems. Delayed starts, prolonged cases and overtime are common. Surgical procedures account for 40-70% of hospital revenues and 30-40% of total costs. Most planning and scheduling in healthcare is done without modern planning tools, which have potential for improving access by assisting in operations planning support. We identified key planning scenarios of interest to perioperative leaders, in order to examine the feasibility of applying combinatorial optimization software solving some of those planning issues in the operative setting. Perioperative leaders desire a broad range of tools for planning and assessing alternate solutions. Our modeled solutions generated feasible solutions that varied as expected, based on resource and policy assumptions and found better utilization of scarce resources. Combinatorial optimization modeling can effectively evaluate alternatives to support key decisions for planning clinical workflow and improving care efficiency and satisfaction.
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

PubMed Central

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha

2014-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling.

PubMed

Piro, Vitor C; Matschkowski, Marcel; Renard, Bernhard Y

2017-08-14

Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
Workflow Management for Complex HEP Analyses

NASA Astrophysics Data System (ADS)

Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.

2017-10-01

We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.
The road map towards providing a robust Raman spectroscopy-based cancer diagnostic platform and integration into clinic

NASA Astrophysics Data System (ADS)

Lau, Katherine; Isabelle, Martin; Lloyd, Gavin R.; Old, Oliver; Shepherd, Neil; Bell, Ian M.; Dorney, Jennifer; Lewis, Aaran; Gaifulina, Riana; Rodriguez-Justo, Manuel; Kendall, Catherine; Stone, Nicolas; Thomas, Geraint; Reece, David

2016-03-01

Despite the demonstrated potential as an accurate cancer diagnostic tool, Raman spectroscopy (RS) is yet to be adopted by the clinic for histopathology reviews. The Stratified Medicine through Advanced Raman Technologies (SMART) consortium has begun to address some of the hurdles in its adoption for cancer diagnosis. These hurdles include awareness and acceptance of the technology, practicality of integration into the histopathology workflow, data reproducibility and availability of transferrable models. We have formed a consortium, in joint efforts, to develop optimised protocols for tissue sample preparation, data collection and analysis. These protocols will be supported by provision of suitable hardware and software tools to allow statistically sound classification models to be built and transferred for use on different systems. In addition, we are building a validated gastrointestinal (GI) cancers model, which can be trialled as part of the histopathology workflow at hospitals, and a classification tool. At the end of the project, we aim to deliver a robust Raman based diagnostic platform to enable clinical researchers to stage cancer, define tumour margin, build cancer diagnostic models and discover novel disease bio markers.
KNIME for reproducible cross-domain analysis of life science data.

PubMed

Fillbrunn, Alexander; Dietz, Christian; Pfeuffer, Julianus; Rahn, René; Landrum, Gregory A; Berthold, Michael R

2017-11-10

Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment progresses and are seldomly well documented, a fact that hinders the reproducibility of the experiment. Workflow systems such as KNIME Analytics Platform aim to solve these problems by providing a platform for connecting tools graphically and guaranteeing the same results on different operating systems. As an open source software, KNIME allows scientists and programmers to provide their own extensions to the scientific community. In this review paper we present selected extensions from the life sciences that simplify data exploration, analysis, and visualization and are interoperable due to KNIME's unified data model. Additionally, we name other workflow systems that are commonly used in the life sciences and highlight their similarities and differences to KNIME. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Workflow and web application for annotating NCBI BioProject transcriptome data.

PubMed

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A; Barrero, Luz S; Landsman, David; Mariño-Ramírez, Leonardo

2017-01-01

The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA

PubMed Central

Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.

2007-01-01

The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337
NASA Tech Briefs, February 2014

NASA Technical Reports Server (NTRS)

2014-01-01

Topics include: JWST Integrated Simulation and Test (JIST) Core; Software for Non-Contact Measurement of an Individual's Heart Rate Using a Common Camera; Rapid Infrared Pixel Grating Response Testbed; Temperature Measurement and Stabilization in a Birefringent Whispering Gallery Resonator; JWST IV and V Simulation and Test (JIST) Solid State Recorder (SSR) Simulator; Development of a Precision Thermal Doubler for Deep Space; Improving Friction Stir Welds Using Laser Peening; Methodology of Evaluating Margins of Safety in Critical Brazed Joints; Interactive Inventory Monitoring; Sensor for Spatial Detection of Single-Event Effects in Semiconductor-Based Electronics; Reworked CCGA-624 Interconnect Package Reliability for Extreme Thermal Environments; Current-Controlled Output Driver for Directly Coupled Loads; Bulk Metallic Glasses and Matrix Composites as Spacecraft Shielding; Touch Temperature Coating for Electrical Equipment on Spacecraft; Li-Ion Electrolytes Containing Flame-Retardant Additives; Autonomous Robotic Manipulation (ARM); CARVE Log; Platform Perspective Toolkit; Convex Hull-Based Plume and Anomaly Detection; Pre-Filtration of GOSAT Data Using Only Level 1 Data and an Intelligent Filter to Remove Low Clouds; Affordability Comparison Tool - ACT; "Ascent - Commemorating Shuttle" for iPad; Cassini Mission App; Light-Weight Workflow Engine: A Server for Executing Generic Workflows; Model for System Engineering of the CheMin Instrument; Timeline Central Concepts; Parallel Particle Filter Toolkit; Particle Filter Simulation and Analysis Enabling Non-Traditional Navigation; Quasi-Terminator Orbits for Mapping Small Primitive Bodies; The Subgrid-Scale Scalar Variance Under Supercritical Pressure Conditions; Sliding Gait for ATHLETE Mobility; and Automated Generation of Adaptive Filter Using a Genetic Algorithm and Cyclic Rule Reduction.
Genetic Design Automation: engineering fantasy or scientific renewal?

PubMed Central

Lux, Matthew W.; Bramlett, Brian W.; Ball, David A.; Peccoud, Jean

2013-01-01

Synthetic biology aims to make genetic systems more amenable to engineering, which has naturally led to the development of Computer-Aided Design (CAD) tools. Experimentalists still primarily rely on project-specific ad-hoc workflows instead of domain-specific tools, suggesting that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. PMID:22001068
Using telephony data to facilitate discovery of clinical workflows.

PubMed

Rucker, Donald W

2017-04-19

Discovery of clinical workflows to target for redesign using methods such as Lean and Six Sigma is difficult. VoIP telephone call pattern analysis may complement direct observation and EMR-based tools in understanding clinical workflows at the enterprise level by allowing visualization of institutional telecommunications activity. To build an analytic framework mapping repetitive and high-volume telephone call patterns in a large medical center to their associated clinical units using an enterprise unified communications server log file and to support visualization of specific call patterns using graphical networks. Consecutive call detail records from the medical center's unified communications server were parsed to cross-correlate telephone call patterns and map associated phone numbers to a cost center dictionary. Hashed data structures were built to allow construction of edge and node files representing high volume call patterns for display with an open source graph network tool. Summary statistics for an analysis of exactly one week's call detail records at a large academic medical center showed that 912,386 calls were placed with a total duration of 23,186 hours. Approximately half of all calling called number pairs had an average call duration under 60 seconds and of these the average call duration was 27 seconds. Cross-correlation of phone calls identified by clinical cost center can be used to generate graphical displays of clinical enterprise communications. Many calls are short. The compact data transfers within short calls may serve as automation or re-design targets. The large absolute amount of time medical center employees were engaged in VoIP telecommunications suggests that analysis of telephone call patterns may offer additional insights into core clinical workflows.
Developing a Workflow Composite Score to Measure Clinical Information Logistics. A Top-down Approach.

PubMed

Liebe, J D; Hübner, U; Straede, M C; Thye, J

2015-01-01

Availability and usage of individual IT applications have been studied intensively in the past years. Recently, IT support of clinical processes is attaining increasing attention. The underlying construct that describes the IT support of clinical workflows is clinical information logistics. This construct needs to be better understood, operationalised and measured. It is therefore the aim of this study to propose and develop a workflow composite score (WCS) for measuring clinical information logistics and to examine its quality based on reliability and validity analyses. We largely followed the procedural model of MacKenzie and colleagues (2011) for defining and conceptualising the construct domain, for developing the measurement instrument, assessing the content validity, pretesting the instrument, specifying the model, capturing the data and computing the WCS and testing the reliability and validity. Clinical information logistics was decomposed into the descriptors data and information, function, integration and distribution, which embraced the framework validated by an analysis of the international literature. This framework was refined selecting representative clinical processes. We chose ward rounds, pre- and post-surgery processes and discharge as sample processes that served as concrete instances for the measurements. They are sufficiently complex, represent core clinical processes and involve different professions, departments and settings. The score was computed on the basis of data from 183 hospitals of different size, ownership, location and teaching status. Testing the reliability and validity yielded encouraging results: the reliability was high with r(split-half) = 0.89, the WCS discriminated between groups; the WCS correlated significantly and moderately with two EHR models and the WCS received good evaluation results by a sample of chief information officers (n = 67). These findings suggest the further utilisation of the WCS. As the WCS does not assume ideal workflows as a gold standard but measures IT support of clinical workflows according to validated descriptors a high portability of the WCS to other hospitals in other countries is very likely. The WCS will contribute to a better understanding of the construct clinical information logistics.

Objectifying user critique. A means of continuous quality assurance for physician discharge letter composition.

PubMed

Oschem, M; Mahler, V; Prokosch, H U

2011-01-01

The aim of this study is to objectify user critique rendering it usable for quality assurance. Based on formative and summative evaluation results we strive to promote software improvements; in our case, the physician discharge letter composition process at the Department of Dermatology, University Hospital Erlangen, Germany. We developed a novel six-step approach to objectify user critique: 1) acquisition of user critique using subjectivist methods, 2) creation of a workflow model, 3) definition of hypothesis and indicators, 4) measuring of indicators, 5) analyzing results, 6) optimization of the system regarding both subjectivist and objectivist evaluation results. In particular, we derived indicators and workflows directly from user critique/narratives. The identified indicators were mapped onto workflow activities, creating a link between user critique and the evaluated system. Users criticized a new discharge letter system as "too slow" and "too labor-intensive" in comparison with the previously used system. In a stepwise approach we collected subjective user critique, derived a comprehensive process model including deviations and deduced a set of five indicators for objectivist evaluation: processing time, system-related waiting time, number of mouse clicks, number of keyboard inputs, and throughput time. About 3500 measurements have been performed to compare the workflow-steps of both systems, regarding 20 discharge letters. Although the difference of the mean total processing time between both systems was statistically insignificant (2011.7 s vs. 1971.5 s; p = 0.457), we detected a significant difference in waiting times (101.8 s vs. 37.2 s; p <0.001) and number of user interactions (77 vs. 69; p <0.001) in favor of the old system, thus objectifying user critique. Our six-step approach enables objectification of user critique, resulting in objective values for continuous quality assurance. To our knowledge no previous study in medical informatics mapped user critique onto workflow steps. Subjectivist analysis prompted us to use the indicator system-related waiting time for the objectivist study, which was rarely done before. We consider combining subjectivist and objectivist methods as a key point of our approach. Future work will concentrate on automated measurement of indicators.
DOE Office of Scientific and Technical Information (OSTI.GOV)

McCaskey, Alexander J.

There is a lack of state-of-the-art HPC simulation tools for simulating general quantum computing. Furthermore, there are no real software tools that integrate current quantum computers into existing classical HPC workflows. This product, the Quantum Virtual Machine (QVM), solves this problem by providing an extensible framework for pluggable virtual, or physical, quantum processing units (QPUs). It enables the execution of low level quantum assembly codes and returns the results of such executions.
Integrated Sensitivity Analysis Workflow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friedman-Hill, Ernest J.; Hoffman, Edward L.; Gibson, Marcus J.

2014-08-01

Sensitivity analysis is a crucial element of rigorous engineering analysis, but performing such an analysis on a complex model is difficult and time consuming. The mission of the DART Workbench team at Sandia National Laboratories is to lower the barriers to adoption of advanced analysis tools through software integration. The integrated environment guides the engineer in the use of these integrated tools and greatly reduces the cycle time for engineering analysis.
PyDBS: an automated image processing workflow for deep brain stimulation surgery.

PubMed

D'Albis, Tiziano; Haegelen, Claire; Essert, Caroline; Fernández-Vidal, Sara; Lalys, Florent; Jannin, Pierre

2015-02-01

Deep brain stimulation (DBS) is a surgical procedure for treating motor-related neurological disorders. DBS clinical efficacy hinges on precise surgical planning and accurate electrode placement, which in turn call upon several image processing and visualization tasks, such as image registration, image segmentation, image fusion, and 3D visualization. These tasks are often performed by a heterogeneous set of software tools, which adopt differing formats and geometrical conventions and require patient-specific parameterization or interactive tuning. To overcome these issues, we introduce in this article PyDBS, a fully integrated and automated image processing workflow for DBS surgery. PyDBS consists of three image processing pipelines and three visualization modules assisting clinicians through the entire DBS surgical workflow, from the preoperative planning of electrode trajectories to the postoperative assessment of electrode placement. The system's robustness, speed, and accuracy were assessed by means of a retrospective validation, based on 92 clinical cases. The complete PyDBS workflow achieved satisfactory results in 92 % of tested cases, with a median processing time of 28 min per patient. The results obtained are compatible with the adoption of PyDBS in clinical practice.
An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology

PubMed Central

Li, Jian; Batcha, Aarif Mohamed Nazeer; Grüning, Björn; Mansmann, Ulrich R.

2015-01-01

Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility. PMID:27081306
Formalizing an integrative, multidisciplinary cancer therapy discovery workflow

PubMed Central

McGuire, Mary F.; Enderling, Heiko; Wallace, Dorothy I.; Batra, Jaspreet; Jordan, Marie; Kumar, Sushil; Panetta, John C.; Pasquier, Eddy

2014-01-01

Although many clinicians and researchers work to understand cancer, there has been limited success to effectively combine forces and collaborate over time, distance, data and budget constraints. Here we present a workflow template for multidisciplinary cancer therapy that was developed during the 2nd Annual Workshop on Cancer Systems Biology sponsored by Tufts University, Boston, MA in July 2012. The template was applied to the development of a metronomic therapy backbone for neuroblastoma. Three primary groups were identified: clinicians, biologists, and scientists (mathematicians, computer scientists, physicists and engineers). The workflow described their integrative interactions; parallel or sequential processes; data sources and computational tools at different stages as well as the iterative nature of therapeutic development from clinical observations to in vitro, in vivo, and clinical trials. We found that theoreticians in dialog with experimentalists could develop calibrated and parameterized predictive models that inform and formalize sets of testable hypotheses, thus speeding up discovery and validation while reducing laboratory resources and costs. The developed template outlines an interdisciplinary collaboration workflow designed to systematically investigate the mechanistic underpinnings of a new therapy and validate that therapy to advance development and clinical acceptance. PMID:23955390
Managing the CMS Data and Monte Carlo Processing during LHC Run 2

NASA Astrophysics Data System (ADS)

Wissing, C.; CMS Collaboration

2017-10-01

In order to cope with the challenges expected during the LHC Run 2 CMS put in a number of enhancements into the main software packages and the tools used for centrally managed processing. In the presentation we will highlight these improvements that allow CMS to deal with the increased trigger output rate, the increased pileup and the evolution in computing technology. The overall system aims at high flexibility, improved operational flexibility and largely automated procedures. The tight coupling of workflow classes to types of sites has been drastically relaxed. Reliable and high-performing networking between most of the computing sites and the successful deployment of a data-federation allow the execution of workflows using remote data access. That required the development of a largely automatized system to assign workflows and to handle necessary pre-staging of data. Another step towards flexibility has been the introduction of one large global HTCondor Pool for all types of processing workflows and analysis jobs. Besides classical Grid resources also some opportunistic resources as well as Cloud resources have been integrated into that Pool, which gives reach to more than 200k CPU cores.
Development of a data independent acquisition mass spectrometry workflow to enable glycopeptide analysis without predefined glycan compositional knowledge.

PubMed

Lin, Chi-Hung; Krisp, Christoph; Packer, Nicolle H; Molloy, Mark P

2018-02-10

Glycoproteomics investigates glycan moieties in a site specific manner to reveal the functional roles of protein glycosylation. Identification of glycopeptides from data-dependent acquisition (DDA) relies on high quality MS/MS spectra of glycopeptide precursors and often requires manual validation to ensure confident assignments. In this study, we investigated pseudo-MRM (MRM-HR) and data-independent acquisition (DIA) as alternative acquisition strategies for glycopeptide analysis. These approaches allow data acquisition over the full MS/MS scan range allowing data re-analysis post-acquisition, without data re-acquisition. The advantage of MRM-HR over DDA for N-glycopeptide detection was demonstrated from targeted analysis of bovine fetuin where all three N-glycosylation sites were detected, which was not the case with DDA. To overcome the duty cycle limitation of MRM-HR acquisition needed for analysis of complex samples such as plasma we trialed DIA. This allowed development of a targeted DIA method to identify N-glycopeptides without pre-defined knowledge of the glycan composition, thus providing the potential to identify N-glycopeptides with unexpected structures. This workflow was demonstrated by detection of 59 N-glycosylation sites from 41 glycoproteins from a HILIC enriched human plasma tryptic digest. 21 glycoforms of IgG1 glycopeptides were identified including two truncated structures that are rarely reported. We developed a data-independent mass spectrometry workflow to identify specific glycopeptides from complex biological mixtures. The novelty is that this approach does not require glycan composition to be pre-defined, thereby allowing glycopeptides carrying unexpected glycans to be identified. This is demonstrated through the analysis of immunoglobulins in human plasma where we detected two IgG1 glycoforms that are rarely observed. Copyright © 2017 Elsevier B.V. All rights reserved.
Design Automation in Synthetic Biology.

PubMed

Appleton, Evan; Madsen, Curtis; Roehner, Nicholas; Densmore, Douglas

2017-04-03

Design automation refers to a category of software tools for designing systems that work together in a workflow for designing, building, testing, and analyzing systems with a target behavior. In synthetic biology, these tools are called bio-design automation (BDA) tools. In this review, we discuss the BDA tools areas-specify, design, build, test, and learn-and introduce the existing software tools designed to solve problems in these areas. We then detail the functionality of some of these tools and show how they can be used together to create the desired behavior of two types of modern synthetic genetic regulatory networks. Copyright © 2017 Cold Spring Harbor Laboratory Press; all rights reserved.
Development of a High-Throughput Ion-Exchange Resin Characterization Workflow.

PubMed

Liu, Chun; Dermody, Daniel; Harris, Keith; Boomgaard, Thomas; Sweeney, Jeff; Gisch, Daryl; Goltz, Bob

2017-06-12

A novel high-throughout (HTR) ion-exchange (IEX) resin workflow has been developed for characterizing ion exchange equilibrium of commercial and experimental IEX resins against a range of different applications where water environment differs from site to site. Because of its much higher throughput, design of experiment (DOE) methodology can be easily applied for studying the effects of multiple factors on resin performance. Two case studies will be presented to illustrate the efficacy of the combined HTR workflow and DOE method. In case study one, a series of anion exchange resins have been screened for selective removal of NO 3 - and NO 2 - in water environments consisting of multiple other anions, varied pH, and ionic strength. The response surface model (RSM) is developed to statistically correlate the resin performance with the water composition and predict the best resin candidate. In case study two, the same HTR workflow and DOE method have been applied for screening different cation exchange resins in terms of the selective removal of Mg 2+ , Ca 2+ , and Ba 2+ from high total dissolved salt (TDS) water. A master DOE model including all of the cation exchange resins is created to predict divalent cation removal by different IEX resins under specific conditions, from which the best resin candidates can be identified. The successful adoption of HTR workflow and DOE method for studying the ion exchange of IEX resins can significantly reduce the resources and time to address industry and application needs.
Core lipid, surface lipid and apolipoprotein composition analysis of lipoprotein particles as a function of particle size in one workflow integrating asymmetric flow field-flow fractionation and liquid chromatography-tandem mass spectrometry

PubMed Central

Jones, Jeffery I.; Gardner, Michael S.; Schieltz, David M.; Parks, Bryan A.; Toth, Christopher A.; Rees, Jon C.; Andrews, Michael L.; Carter, Kayla; Lehtikoski, Antony K.; McWilliams, Lisa G.; Williamson, Yulanda M.; Bierbaum, Kevin P.; Pirkle, James L.; Barr, John R.

2018-01-01

Lipoproteins are complex molecular assemblies that are key participants in the intricate cascade of extracellular lipid metabolism with important consequences in the formation of atherosclerotic lesions and the development of cardiovascular disease. Multiplexed mass spectrometry (MS) techniques have substantially improved the ability to characterize the composition of lipoproteins. However, these advanced MS techniques are limited by traditional pre-analytical fractionation techniques that compromise the structural integrity of lipoprotein particles during separation from serum or plasma. In this work, we applied a highly effective and gentle hydrodynamic size based fractionation technique, asymmetric flow field-flow fractionation (AF4), and integrated it into a comprehensive tandem mass spectrometry based workflow that was used for the measurement of apolipoproteins (apos A-I, A-II, A-IV, B, C-I, C-II, C-III and E), free cholesterol (FC), cholesterol esters (CE), triglycerides (TG), and phospholipids (PL) (phosphatidylcholine (PC), sphingomyelin (SM), phosphatidylethanolamine (PE), phosphatidylinositol (PI) and lysophosphatidylcholine (LPC)). Hydrodynamic size in each of 40 size fractions separated by AF4 was measured by dynamic light scattering. Measuring all major lipids and apolipoproteins in each size fraction and in the whole serum, using total of 0.1 ml, allowed the volumetric calculation of lipoprotein particle numbers and expression of composition in molar analyte per particle number ratios. Measurements in 110 serum samples showed substantive differences between size fractions of HDL and LDL. Lipoprotein composition within size fractions was expressed in molar ratios of analytes (A-I/A-II, C-II/C-I, C-II/C-III. E/C-III, FC/PL, SM/PL, PE/PL, and PI/PL), showing differences in sample categories with combinations of normal and high levels of Total-C and/or Total-TG. The agreement with previous studies indirectly validates the AF4-LC-MS/MS approach and demonstrates the potential of this workflow for characterization of lipoprotein composition in clinical studies using small volumes of archived frozen samples. PMID:29634782
Diagnostic tool for red blood cell membrane disorders: Assessment of a new generation ektacytometer☆

PubMed Central

Da Costa, Lydie; Suner, Ludovic; Galimand, Julie; Bonnel, Amandine; Pascreau, Tiffany; Couque, Nathalie; Fenneteau, Odile; Mohandas, Narla

2016-01-01

Inherited red blood cell (RBC) membrane disorders, such as hereditary spherocytosis, elliptocytosis and hereditary ovalocytosis, result from mutations in genes encoding various RBC membrane and skeletal proteins. The RBC membrane, a composite structure composed of a lipid bilayer linked to a spectrin/actin-based membrane skeleton, confers upon the RBC unique features of deformability and mechanical stability. The disease severity is primarily dependent on the extent of membrane surface area loss. RBC membrane disorders can be readily diagnosed by various laboratory approaches that include RBC cytology, flow cytometry, ektacytometry, electrophoresis of RBC membrane proteins and genetics. The reference technique for diagnosis of RBC membrane disorders is the osmotic gradient ektacytometry. However, in spite of its recognition as the reference technique, this technique is rarely used as a routine diagnosis tool for RBC membrane disorders due to its limited availability. This may soon change as a new generation of ektacytometer has been recently engineered. In this review, we describe the workflow of the samples shipped to our Hematology laboratory for RBC membrane disorder analysis and the data obtained for a large cohort of French patients presenting with RBC membrane disorders using a newly available version of the ektacytomer. PMID:26603718
Determining the Composition and Stability of Protein Complexes Using an Integrated Label-Free and Stable Isotope Labeling Strategy

PubMed Central

Greco, Todd M.; Guise, Amanda J.; Cristea, Ileana M.

2016-01-01

In biological systems, proteins catalyze the fundamental reactions that underlie all cellular functions, including metabolic processes and cell survival and death pathways. These biochemical reactions are rarely accomplished alone. Rather, they involve a concerted effect from many proteins that may operate in a directed signaling pathway and/or may physically associate in a complex to achieve a specific enzymatic activity. Therefore, defining the composition and regulation of protein complexes is critical for understanding cellular functions. In this chapter, we describe an approach that uses quantitative mass spectrometry (MS) to assess the specificity and the relative stability of protein interactions. Isolation of protein complexes from mammalian cells is performed by rapid immunoaffinity purification, and followed by in-solution digestion and high-resolution mass spectrometry analysis. We employ complementary quantitative MS workflows to assess the specificity of protein interactions using label-free MS and statistical analysis, and the relative stability of the interactions using a metabolic labeling technique. For each candidate protein interaction, scores from the two workflows can be correlated to minimize nonspecific background and profile protein complex composition and relative stability. PMID:26867737
Planetary Geologic Mapping Python Toolbox: A Suite of Tools to Support Mapping Workflows

NASA Astrophysics Data System (ADS)

Hunter, M. A.; Skinner, J. A.; Hare, T. M.; Fortezzo, C. M.

2017-06-01

The collective focus of the Planetary Geologic Mapping Python Toolbox is to provide researchers with additional means to migrate legacy GIS data, assess the quality of data and analysis results, and simplify common mapping tasks.
9th Annual Systems Engineering Conference: Volume 4 Thursday

DTIC Science & Technology

2006-10-26

Connectivity, Speed, Volume • Enterprise application integration • Workflow integration or multi-media • Federated search capability • Link analysis and...categorization, federated search & automated discovery of information — Collaborative tools to quickly share relevant information Built on commercial
SHERPA: an image segmentation and outline feature extraction tool for diatoms and other objects

PubMed Central

2014-01-01

Background Light microscopic analysis of diatom frustules is widely used both in basic and applied research, notably taxonomy, morphometrics, water quality monitoring and paleo-environmental studies. In these applications, usually large numbers of frustules need to be identified and/or measured. Although there is a need for automation in these applications, and image processing and analysis methods supporting these tasks have previously been developed, they did not become widespread in diatom analysis. While methodological reports for a wide variety of methods for image segmentation, diatom identification and feature extraction are available, no single implementation combining a subset of these into a readily applicable workflow accessible to diatomists exists. Results The newly developed tool SHERPA offers a versatile image processing workflow focused on the identification and measurement of object outlines, handling all steps from image segmentation over object identification to feature extraction, and providing interactive functions for reviewing and revising results. Special attention was given to ease of use, applicability to a broad range of data and problems, and supporting high throughput analyses with minimal manual intervention. Conclusions Tested with several diatom datasets from different sources and of various compositions, SHERPA proved its ability to successfully analyze large amounts of diatom micrographs depicting a broad range of species. SHERPA is unique in combining the following features: application of multiple segmentation methods and selection of the one giving the best result for each individual object; identification of shapes of interest based on outline matching against a template library; quality scoring and ranking of resulting outlines supporting quick quality checking; extraction of a wide range of outline shape descriptors widely used in diatom studies and elsewhere; minimizing the need for, but enabling manual quality control and corrections. Although primarily developed for analyzing images of diatom valves originating from automated microscopy, SHERPA can also be useful for other object detection, segmentation and outline-based identification problems. PMID:24964954
SHERPA: an image segmentation and outline feature extraction tool for diatoms and other objects.

PubMed

Kloster, Michael; Kauer, Gerhard; Beszteri, Bánk

2014-06-25

Light microscopic analysis of diatom frustules is widely used both in basic and applied research, notably taxonomy, morphometrics, water quality monitoring and paleo-environmental studies. In these applications, usually large numbers of frustules need to be identified and/or measured. Although there is a need for automation in these applications, and image processing and analysis methods supporting these tasks have previously been developed, they did not become widespread in diatom analysis. While methodological reports for a wide variety of methods for image segmentation, diatom identification and feature extraction are available, no single implementation combining a subset of these into a readily applicable workflow accessible to diatomists exists. The newly developed tool SHERPA offers a versatile image processing workflow focused on the identification and measurement of object outlines, handling all steps from image segmentation over object identification to feature extraction, and providing interactive functions for reviewing and revising results. Special attention was given to ease of use, applicability to a broad range of data and problems, and supporting high throughput analyses with minimal manual intervention. Tested with several diatom datasets from different sources and of various compositions, SHERPA proved its ability to successfully analyze large amounts of diatom micrographs depicting a broad range of species. SHERPA is unique in combining the following features: application of multiple segmentation methods and selection of the one giving the best result for each individual object; identification of shapes of interest based on outline matching against a template library; quality scoring and ranking of resulting outlines supporting quick quality checking; extraction of a wide range of outline shape descriptors widely used in diatom studies and elsewhere; minimizing the need for, but enabling manual quality control and corrections. Although primarily developed for analyzing images of diatom valves originating from automated microscopy, SHERPA can also be useful for other object detection, segmentation and outline-based identification problems.
compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.

PubMed

Edmands, William M B; Petrick, Lauren; Barupal, Dinesh K; Scalbert, Augustin; Wilson, Mark J; Wickliffe, Jeffrey K; Rappaport, Stephen M

2017-04-04

A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS 2 Miner (Comprehensive MS 2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS 2 data files as inputs. The number of MS 2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS 2 Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS 2 data with a Web application GUI compMS 2 Explorer (Comprehensive MS 2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS 2 Miner workflow consists of the following steps: (i) matching unknown MS 1 features to precursor MS 2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS 2 Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS 2 Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS 2 Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS 2 Explorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).
Towards better digital pathology workflows: programming libraries for high-speed sharpness assessment of Whole Slide Images

PubMed Central

2014-01-01

Background Since microscopic slides can now be automatically digitized and integrated in the clinical workflow, quality assessment of Whole Slide Images (WSI) has become a crucial issue. We present a no-reference quality assessment method that has been thoroughly tested since 2010 and is under implementation in multiple sites, both public university-hospitals and private entities. It is part of the FlexMIm R&D project which aims to improve the global workflow of digital pathology. For these uses, we have developed two programming libraries, in Java and Python, which can be integrated in various types of WSI acquisition systems, viewers and image analysis tools. Methods Development and testing have been carried out on a MacBook Pro i7 and on a bi-Xeon 2.7GHz server. Libraries implementing the blur assessment method have been developed in Java, Python, PHP5 and MySQL5. For web applications, JavaScript, Ajax, JSON and Sockets were also used, as well as the Google Maps API. Aperio SVS files were converted into the Google Maps format using VIPS and Openslide libraries. Results We designed the Java library as a Service Provider Interface (SPI), extendable by third parties. Analysis is computed in real-time (3 billion pixels per minute). Tests were made on 5000 single images, 200 NDPI WSI, 100 Aperio SVS WSI converted to the Google Maps format. Conclusions Applications based on our method and libraries can be used upstream, as calibration and quality control tool for the WSI acquisition systems, or as tools to reacquire tiles while the WSI is being scanned. They can also be used downstream to reacquire the complete slides that are below the quality threshold for surgical pathology analysis. WSI may also be displayed in a smarter way by sending and displaying the regions of highest quality before other regions. Such quality assessment scores could be integrated as WSI's metadata shared in clinical, research or teaching contexts, for a more efficient medical informatics workflow. PMID:25565494
Modernizing Earth and Space Science Modeling Workflows in the Big Data Era

NASA Astrophysics Data System (ADS)

Kinter, J. L.; Feigelson, E.; Walker, R. J.; Tino, C.

2017-12-01

Modeling is a major aspect of the Earth and space science research. The development of numerical models of the Earth system, planetary systems or astrophysical systems is essential to linking theory with observations. Optimal use of observations that are quite expensive to obtain and maintain typically requires data assimilation that involves numerical models. In the Earth sciences, models of the physical climate system are typically used for data assimilation, climate projection, and inter-disciplinary research, spanning applications from analysis of multi-sensor data sets to decision-making in climate-sensitive sectors with applications to ecosystems, hazards, and various biogeochemical processes. In space physics, most models are from first principles, require considerable expertise to run and are frequently modified significantly for each case study. The volume and variety of model output data from modeling Earth and space systems are rapidly increasing and have reached a scale where human interaction with data is prohibitively inefficient. A major barrier to progress is that modeling workflows isn't deemed by practitioners to be a design problem. Existing workflows have been created by a slow accretion of software, typically based on undocumented, inflexible scripts haphazardly modified by a succession of scientists and students not trained in modern software engineering methods. As a result, existing modeling workflows suffer from an inability to onboard new datasets into models; an inability to keep pace with accelerating data production rates; and irreproducibility, among other problems. These factors are creating an untenable situation for those conducting and supporting Earth system and space science. Improving modeling workflows requires investments in hardware, software and human resources. This paper describes the critical path issues that must be targeted to accelerate modeling workflows, including script modularization, parallelization, and automation in the near term, and longer term investments in virtualized environments for improved scalability, tolerance for lossy data compression, novel data-centric memory and storage technologies, and tools for peer reviewing, preserving and sharing workflows, as well as fundamental statistical and machine learning algorithms.

Introducing students to digital geological mapping: A workflow based on cheap hardware and free software

NASA Astrophysics Data System (ADS)

Vrabec, Marko; Dolžan, Erazem

2016-04-01

The undergraduate field course in Geological Mapping at the University of Ljubljana involves 20-40 students per year, which precludes the use of specialized rugged digital field equipment as the costs would be way beyond the capabilities of the Department. A different mapping area is selected each year with the aim to provide typical conditions that a professional geologist might encounter when doing fieldwork in Slovenia, which includes rugged relief, dense tree cover, and moderately-well- to poorly-exposed bedrock due to vegetation and urbanization. It is therefore mandatory that the digital tools and workflows are combined with classical methods of fieldwork, since, for example, full-time precise GNSS positioning is not viable under such circumstances. Additionally, due to the prevailing combination of complex geological structure with generally poor exposure, students cannot be expected to produce line (vector) maps of geological contacts on the go, so there is no need for such functionality in hardware and software that we use in the field. Our workflow therefore still relies on paper base maps, but is strongly complemented with digital tools to provide robust positioning, track recording, and acquisition of various point-based data. Primary field hardware are students' Android-based smartphones and optionally tablets. For our purposes, the built-in GNSS chips provide adequate positioning precision most of the time, particularly if they are GLONASS-capable. We use Oruxmaps, a powerful free offline map viewer for the Android platform, which facilitates the use of custom-made geopositioned maps. For digital base maps, which we prepare in free Windows QGIS software, we use scanned topographic maps provided by the National Geodetic Authority, but also other maps such as aerial imagery, processed Digital Elevation Models, scans of existing geological maps, etc. Point data, like important outcrop locations or structural measurements, are entered into Oruxmaps as waypoints. Students are also encouraged to directly measure structural data with specialized Android apps such as the MVE FieldMove Clino. Digital field data is exported from Oruxmaps to Windows computers primarily in the ubiquitous GPX data format and then integrated in the QGIS environment. Recorded GPX tracks are also used with the free Geosetter Windows software to geoposition and tag any digital photographs taken in the field. With minimal expenses, our workflow provides the students with basic familiarity and experience in using digital field tools and methods. The workflow is also practical enough for the prevailing field conditions of Slovenia that the faculty staff is using it in geological mapping for scientific research and consultancy work.
The implementation of e-learning tools to enhance undergraduate bioinformatics teaching and learning: a case study in the National University of Singapore

PubMed Central

2009-01-01

Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511
The IATH ELAN Text-Sync Tool: A Simple System for Mobilizing ELAN Transcripts On- or Off-Line

ERIC Educational Resources Information Center

Dobrin, Lise M.; Ross, Douglas

2017-01-01

In this article we present the IATH ELAN Text-Sync Tool (ETST; see http://community.village.virginia.edu/etst), a series of scripts and workflow for playing ELAN files and associated audiovisual media in a web browser either on- or off-line. ELAN has become an indispensable part of documentary linguists' toolkit, but it is less than ideal for…
Information Management Workflow and Tools Enabling Multiscale Modeling Within ICME Paradigm

NASA Technical Reports Server (NTRS)

Arnold, Steven M.; Bednarcyk, Brett A.; Austin, Nic; Terentjev, Igor; Cebon, Dave; Marsden, Will

2016-01-01

With the increased emphasis on reducing the cost and time to market of new materials, the need for analytical tools that enable the virtual design and optimization of materials throughout their processing - internal structure - property - performance envelope, along with the capturing and storing of the associated material and model information across its lifecycle, has become critical. This need is also fueled by the demands for higher efficiency in material testing; consistency, quality and traceability of data; product design; engineering analysis; as well as control of access to proprietary or sensitive information. Fortunately, material information management systems and physics-based multiscale modeling methods have kept pace with the growing user demands. Herein, recent efforts to establish workflow for and demonstrate a unique set of web application tools for linking NASA GRC's Integrated Computational Materials Engineering (ICME) Granta MI database schema and NASA GRC's Integrated multiscale Micromechanics Analysis Code (ImMAC) software toolset are presented. The goal is to enable seamless coupling between both test data and simulation data, which is captured and tracked automatically within Granta MI®, with full model pedigree information. These tools, and this type of linkage, are foundational to realizing the full potential of ICME, in which materials processing, microstructure, properties, and performance are coupled to enable application-driven design and optimization of materials and structures.
SMITH: a LIMS for handling next-generation sequencing workflows

PubMed Central

2014-01-01

Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
nmsBuilder: Freeware to create subject-specific musculoskeletal models for OpenSim.

PubMed

Valente, Giordano; Crimi, Gianluigi; Vanella, Nicola; Schileo, Enrico; Taddei, Fulvia

2017-12-01

Musculoskeletal modeling and simulations of movement have been increasingly used in orthopedic and neurological scenarios, with increased attention to subject-specific applications. In general, musculoskeletal modeling applications have been facilitated by the development of dedicated software tools; however, subject-specific studies have been limited also by time-consuming modeling workflows and high skilled expertise required. In addition, no reference tools exist to standardize the process of musculoskeletal model creation and make it more efficient. Here we present a freely available software application, nmsBuilder 2.0, to create musculoskeletal models in the file format of OpenSim, a widely-used open-source platform for musculoskeletal modeling and simulation. nmsBuilder 2.0 is the result of a major refactoring of a previous implementation that moved a first step toward an efficient workflow for subject-specific model creation. nmsBuilder includes a graphical user interface that provides access to all functionalities, based on a framework for computer-aided medicine written in C++. The operations implemented can be used in a workflow to create OpenSim musculoskeletal models from 3D surfaces. A first step includes data processing to create supporting objects necessary to create models, e.g. surfaces, anatomical landmarks, reference systems; and a second step includes the creation of OpenSim objects, e.g. bodies, joints, muscles, and the corresponding model. We present a case study using nmsBuilder 2.0: the creation of an MRI-based musculoskeletal model of the lower limb. The model included four rigid bodies, five degrees of freedom and 43 musculotendon actuators, and was created from 3D surfaces of the segmented images of a healthy subject through the modeling workflow implemented in the software application. We have presented nmsBuilder 2.0 for the creation of musculoskeletal OpenSim models from image-based data, and made it freely available via nmsbuilder.org. This application provides an efficient workflow for model creation and helps standardize the process. We hope this would help promote personalized applications in musculoskeletal biomechanics, including larger sample size studies, and might also represent a basis for future developments for specific applications. Copyright © 2017 Elsevier B.V. All rights reserved.
SMITH: a LIMS for handling next-generation sequencing workflows.

PubMed

Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko

2014-01-01

Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users.

PubMed

Watts, Nicholas A; Feltus, Frank A

2017-02-15

The ability to centralize and store data for long periods on an end user's computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS . ffeltus@clemson.edu. © The Author 2016. Published by Oxford University Press.
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users

PubMed Central

Watts, Nicholas A.

2017-01-01

Motivation: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. Results: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. Availability and Implementation: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS. Contact: ffeltus@clemson.edu PMID:27797780
Profiling of Histone Post-Translational Modifications in Mouse Brain with High-Resolution Top-Down Mass Spectrometry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Mowei; Paša-Tolić, Ljiljana; Stenoien, David L.

Histones play central roles in most chromosomal functions and both their basic biology and roles in disease have been the subject of intense study. Since multiple PTMs along the entire protein sequence are potential regulators of histones, a top-down approach, where intact proteins are analyzed, is ultimately required for complete characterization of proteoforms. However, significant challenges remain for top-down histone analysis primarily because of deficiencies in separation/resolving power and effective identification algorithms. Here, we used state of the art mass spectrometry and a bioinformatics workflow for targeted data analysis and visualization. The workflow uses ProMex for intact mass deconvolution, MSPathFindermore » as search engine, and LcMsSpectator as a data visualization tool. ProMex sums across retention time to maximize sensitivity and accuracy for low abundance species in MS1deconvolution. MSPathFinder searches the MS2 data against protein sequence databases with user-defined modifications. LcMsSpectator presents the results from ProMex and MSPathFinder in a format that allows quick manual evaluation of critical attributes for high-confidence identifications. When complemented with the open-modification tool TopPIC, this workflow enabled identification of novel histone PTMs including tyrosine bromination on histone H4 and H2A, H3 glutathionylation, and mapping of conventional PTMs along the entire protein for many histone subunits.« less
Using ESO Reflex with Web Services

NASA Astrophysics Data System (ADS)

Järveläinen, P.; Savolainen, V.; Oittinen, T.; Maisala, S.; Ullgrén, M. Hook, R.

2008-08-01

ESO Reflex is a prototype graphical workflow system, based on Taverna, and primarily intended to be a flexible way of running ESO data reduction recipes along with other legacy applications and user-written tools. ESO Reflex can also readily use the Taverna Web Services features that are based on the Apache Axis SOAP implementation. Taverna is a general purpose Web Service client, and requires no programming to use such services. However, Taverna also has some restrictions: for example, no numerical types such integers. In addition the preferred binding style is document/literal wrapped, but most astronomical services publish the Axis default WSDL using RPC/encoded style. Despite these minor limitations we have created simple but very promising test VO workflow using the Sesame name resolver service at CDS Strasbourg, the Hubble SIAP server at the Multi-Mission Archive at Space Telescope (MAST) and the WESIX image cataloging and catalogue cross-referencing service at the University of Pittsburgh. ESO Reflex can also pass files and URIs via the PLASTIC protocol to visualisation tools and has its own viewer for VOTables. We picked these three Web Services to try to set up a realistic and useful ESO Reflex workflow. They also demonstrate ESO Reflex abilities to use many kind of Web Services because each of them requires a different interface. We describe each of these services in turn and comment on how it was used
Light-weight Parallel Python Tools for Earth System Modeling Workflows

NASA Astrophysics Data System (ADS)

Mickelson, S. A.; Paul, K.; Xu, H.; Dennis, J.; Brown, D. I.

2015-12-01

With the growth in computing power over the last 30 years, earth system modeling codes have become increasingly data-intensive. As an example, it is expected that the data required for the next Intergovernmental Panel on Climate Change (IPCC) Assessment Report (AR6) will increase by more than 10x to an expected 25PB per climate model. Faced with this daunting challenge, developers of the Community Earth System Model (CESM) have chosen to change the format of their data for long-term storage from time-slice to time-series, in order to reduce the required download bandwidth needed for later analysis and post-processing by climate scientists. Hence, efficient tools are required to (1) perform the transformation of the data from time-slice to time-series format and to (2) compute climatology statistics, needed for many diagnostic computations, on the resulting time-series data. To address the first of these two challenges, we have developed a parallel Python tool for converting time-slice model output to time-series format. To address the second of these challenges, we have developed a parallel Python tool to perform fast time-averaging of time-series data. These tools are designed to be light-weight, be easy to install, have very few dependencies, and can be easily inserted into the Earth system modeling workflow with negligible disruption. In this work, we present the motivation, approach, and testing results of these two light-weight parallel Python tools, as well as our plans for future research and development.
The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs.

PubMed

Stocks, Matthew B; Mohorianu, Irina; Beckers, Matthew; Paicu, Claudia; Moxon, Simon; Thody, Joshua; Dalmay, Tamas; Moulton, Vincent

2018-05-02

RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs. Recent technical advances enabled the analysis of larger, complex datasets and the investigation of microRNAs and the less known small interfering RNAs. However, the size and intricacy of current data requires a comprehensive set of tools, able to discriminate the patterns from the low-level, noise-like, variation; numerous and varied suggestions from the community represent an invaluable source of ideas for future tools, the ability of the community to contribute to this software is essential. We present a new version of the UEA sRNA Workbench, reconfigured to allow an easy insertion of new tools/workflows. In its released form, it comprises of a suite of tools in a user-friendly environment, with enhanced capabilities for a comprehensive processing of sRNA-seq data e.g. tools for an accurate prediction of sRNA loci (CoLIde) and miRNA loci (miRCat2), as well as workflows to guide the users through common steps such as quality checking of the input data, normalization of abundances or detection of differential expression represent the first step in sRNA-seq analyses. The UEA sRNA Workbench is available at: http://srna-workbench.cmp.uea.ac.uk The source code is available at: https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench. v.moulton@uea.ac.uk.
An Internet supported workflow for the publication process in UMVF (French Virtual Medical University).

PubMed

Renard, Jean-Marie; Bourde, Annabel; Cuggia, Marc; Garcelon, Nicolas; Souf, Nathalie; Darmoni, Stephan; Beuscart, Régis; Brunetaud, Jean-Marc

2007-01-01

The " Université Médicale Virtuelle Francophone" (UMVF) is a federation of French medical schools. Its main goal is to share the production and use of pedagogic medical resources generated by academic medical teachers. We developed an Open-Source application based upon a workflow system, which provides an improved publication process for the UMVF. For teachers, the tool permits easy and efficient upload of new educational resources. For web masters it provides a mechanism to easily locate and validate the resources. For librarian it provide a way to improve the efficiency of indexation. For all, the utility provides a workflow system to control the publication process. On the students side, the application improves the value of the UMVF repository by facilitating the publication of new resources and by providing an easy way to find a detailed description of a resource and to check any resource from the UMVF to ascertain its quality and integrity, even if the resource is an old deprecated version. The server tier of the application is used to implement the main workflow functionalities and is deployed on certified UMVF servers using the PHP language, an LDAP directory and an SQL database. The client tier of the application provides both the workflow and the search and check functionalities. A unique signature for each resource, was needed to provide security functionality and is implemented using a Digest algorithm. The testing performed by Rennes and Lille verified the functionality and conformity with our specifications.
Unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in the outpatient pharmacy.

PubMed

Nanji, Karen C; Rothschild, Jeffrey M; Boehne, Jennifer J; Keohane, Carol A; Ash, Joan S; Poon, Eric G

2014-01-01

Electronic prescribing systems have often been promoted as a tool for reducing medication errors and adverse drug events. Recent evidence has revealed that adoption of electronic prescribing systems can lead to unintended consequences such as the introduction of new errors. The purpose of this study is to identify and characterize the unrealized potential and residual consequences of electronic prescribing on pharmacy workflow in an outpatient pharmacy. A multidisciplinary team conducted direct observations of workflow in an independent pharmacy and semi-structured interviews with pharmacy staff members about their perceptions of the unrealized potential and residual consequences of electronic prescribing systems. We used qualitative methods to iteratively analyze text data using a grounded theory approach, and derive a list of major themes and subthemes related to the unrealized potential and residual consequences of electronic prescribing. We identified the following five themes: Communication, workflow disruption, cost, technology, and opportunity for new errors. These contained 26 unique subthemes representing different facets of our observations and the pharmacy staff's perceptions of the unrealized potential and residual consequences of electronic prescribing. We offer targeted solutions to improve electronic prescribing systems by addressing the unrealized potential and residual consequences that we identified. These recommendations may be applied not only to improve staff perceptions of electronic prescribing systems but also to improve the design and/or selection of these systems in order to optimize communication and workflow within pharmacies while minimizing both cost and the potential for the introduction of new errors.
Automatic DNA Diagnosis for 1D Gel Electrophoresis Images using Bio-image Processing Technique.

PubMed

Intarapanich, Apichart; Kaewkamnerd, Saowaluck; Shaw, Philip J; Ukosakit, Kittipat; Tragoonrung, Somvong; Tongsima, Sissades

2015-01-01

DNA gel electrophoresis is a molecular biology technique for separating different sizes of DNA fragments. Applications of DNA gel electrophoresis include DNA fingerprinting (genetic diagnosis), size estimation of DNA, and DNA separation for Southern blotting. Accurate interpretation of DNA banding patterns from electrophoretic images can be laborious and error prone when a large number of bands are interrogated manually. Although many bio-imaging techniques have been proposed, none of them can fully automate the typing of DNA owing to the complexities of migration patterns typically obtained. We developed an image-processing tool that automatically calls genotypes from DNA gel electrophoresis images. The image processing workflow comprises three main steps: 1) lane segmentation, 2) extraction of DNA bands and 3) band genotyping classification. The tool was originally intended to facilitate large-scale genotyping analysis of sugarcane cultivars. We tested the proposed tool on 10 gel images (433 cultivars) obtained from polyacrylamide gel electrophoresis (PAGE) of PCR amplicons for detecting intron length polymorphisms (ILP) on one locus of the sugarcanes. These gel images demonstrated many challenges in automated lane/band segmentation in image processing including lane distortion, band deformity, high degree of noise in the background, and bands that are very close together (doublets). Using the proposed bio-imaging workflow, lanes and DNA bands contained within are properly segmented, even for adjacent bands with aberrant migration that cannot be separated by conventional techniques. The software, called GELect, automatically performs genotype calling on each lane by comparing with an all-banding reference, which was created by clustering the existing bands into the non-redundant set of reference bands. The automated genotype calling results were verified by independent manual typing by molecular biologists. This work presents an automated genotyping tool from DNA gel electrophoresis images, called GELect, which was written in Java and made available through the imageJ framework. With a novel automated image processing workflow, the tool can accurately segment lanes from a gel matrix, intelligently extract distorted and even doublet bands that are difficult to identify by existing image processing tools. Consequently, genotyping from DNA gel electrophoresis can be performed automatically allowing users to efficiently conduct large scale DNA fingerprinting via DNA gel electrophoresis. The software is freely available from http://www.biotec.or.th/gi/tools/gelect.
Service Oriented Architecture for Coast Guard Command and Control

DTIC Science & Technology

2007-03-01

Operations BPEL4WS The Business Process Execution Language for Web Services BPMN Business Process Modeling Notation CASP Computer Aided Search Planning...Business Process Modeling Notation ( BPMN ) provides a standardized graphical notation for drawing business processes in a workflow. Software tools
Genetic design automation: engineering fantasy or scientific renewal?

PubMed

Lux, Matthew W; Bramlett, Brian W; Ball, David A; Peccoud, Jean

2012-02-01

The aim of synthetic biology is to make genetic systems more amenable to engineering, which has naturally led to the development of computer-aided design (CAD) tools. Experimentalists still primarily rely on project-specific ad hoc workflows instead of domain-specific tools, which suggests that CAD tools are lagging behind the front line of the field. Here, we discuss the scientific hurdles that have limited the productivity gains anticipated from existing tools. We argue that the real value of efforts to develop CAD tools is the formalization of genetic design rules that determine the complex relationships between genotype and phenotype. Copyright © 2011 Elsevier Ltd. All rights reserved.
Epiviz: a view inside the design of an integrated visual analysis software for genomics

PubMed Central

2015-01-01

Background Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community. PMID:26328750
Data Integration Tool: Permafrost Data Debugging

NASA Astrophysics Data System (ADS)

Wilcox, H.; Schaefer, K. M.; Jafarov, E. E.; Pulsifer, P. L.; Strawhacker, C.; Yarmey, L.; Basak, R.

2017-12-01

We developed a Data Integration Tool (DIT) to significantly speed up the time of manual processing needed to translate inconsistent, scattered historical permafrost data into files ready to ingest directly into the Global Terrestrial Network-Permafrost (GTN-P). The United States National Science Foundation funded this project through the National Snow and Ice Data Center (NSIDC) with the GTN-P to improve permafrost data access and discovery. We leverage this data to support science research and policy decisions. DIT is a workflow manager that divides data preparation and analysis into a series of steps or operations called widgets (https://github.com/PermaData/DIT). Each widget does a specific operation, such as read, multiply by a constant, sort, plot, and write data. DIT allows the user to select and order the widgets as desired to meet their specific needs, incrementally interact with and evolve the widget workflows, and save those workflows for reproducibility. Taking ideas from visual programming found in the art and design domain, debugging and iterative design principles from software engineering, and the scientific data processing and analysis power of Fortran and Python it was written for interactive, iterative data manipulation, quality control, processing, and analysis of inconsistent data in an easily installable application. DIT was used to completely translate one dataset (133 sites) that was successfully added to GTN-P, nearly translate three datasets (270 sites), and is scheduled to translate 10 more datasets ( 1000 sites) from the legacy inactive site data holdings of the Frozen Ground Data Center (FGDC). Iterative development has provided the permafrost and wider scientific community with an extendable tool designed specifically for the iterative process of translating unruly data.

In-depth evaluation of software tools for data-independent acquisition based label-free quantification.

PubMed

Kuharev, Jörg; Navarro, Pedro; Distler, Ute; Jahn, Olaf; Tenzer, Stefan

2015-09-01

Label-free quantification (LFQ) based on data-independent acquisition workflows currently experiences increasing popularity. Several software tools have been recently published or are commercially available. The present study focuses on the evaluation of three different software packages (Progenesis, synapter, and ISOQuant) supporting ion mobility enhanced data-independent acquisition data. In order to benchmark the LFQ performance of the different tools, we generated two hybrid proteome samples of defined quantitative composition containing tryptically digested proteomes of three different species (mouse, yeast, Escherichia coli). This model dataset simulates complex biological samples containing large numbers of both unregulated (background) proteins as well as up- and downregulated proteins with exactly known ratios between samples. We determined the number and dynamic range of quantifiable proteins and analyzed the influence of applied algorithms (retention time alignment, clustering, normalization, etc.) on quantification results. Analysis of technical reproducibility revealed median coefficients of variation of reported protein abundances below 5% for MS(E) data for Progenesis and ISOQuant. Regarding accuracy of LFQ, evaluation with synapter and ISOQuant yielded superior results compared to Progenesis. In addition, we discuss reporting formats and user friendliness of the software packages. The data generated in this study have been deposited to the ProteomeXchange Consortium with identifier PXD001240 (http://proteomecentral.proteomexchange.org/dataset/PXD001240). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Redesign of a computerized clinical reminder for colorectal cancer screening: a human-computer interaction evaluation

PubMed Central

2011-01-01

Background Based on barriers to the use of computerized clinical decision support (CDS) learned in an earlier field study, we prototyped design enhancements to the Veterans Health Administration's (VHA's) colorectal cancer (CRC) screening clinical reminder to compare against the VHA's current CRC reminder. Methods In a controlled simulation experiment, 12 primary care providers (PCPs) used prototypes of the current and redesigned CRC screening reminder in a within-subject comparison. Quantitative measurements were based on a usability survey, workload assessment instrument, and workflow integration survey. We also collected qualitative data on both designs. Results Design enhancements to the VHA's existing CRC screening clinical reminder positively impacted aspects of usability and workflow integration but not workload. The qualitative analysis revealed broad support across participants for the design enhancements with specific suggestions for improving the reminder further. Conclusions This study demonstrates the value of a human-computer interaction evaluation in informing the redesign of information tools to foster uptake, integration into workflow, and use in clinical practice. PMID:22126324
Clinical, information and business process modeling to promote development of safe and flexible software.

PubMed

Liaw, Siaw-Teng; Deveny, Elizabeth; Morrison, Iain; Lewis, Bryn

2006-09-01

Using a factorial vignette survey and modeling methodology, we developed clinical and information models - incorporating evidence base, key concepts, relevant terms, decision-making and workflow needed to practice safely and effectively - to guide the development of an integrated rule-based knowledge module to support prescribing decisions in asthma. We identified workflows, decision-making factors, factor use, and clinician information requirements. The Unified Modeling Language (UML) and public domain software and knowledge engineering tools (e.g. Protégé) were used, with the Australian GP Data Model as the starting point for expressing information needs. A Web Services service-oriented architecture approach was adopted within which to express functional needs, and clinical processes and workflows were expressed in the Business Process Execution Language (BPEL). This formal analysis and modeling methodology to define and capture the process and logic of prescribing best practice in a reference implementation is fundamental to tackling deficiencies in prescribing decision support software.
A computational workflow for designing silicon donor qubits

DOE PAGES

Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; ...

2016-09-19

Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to performmore » detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. In conclusion, the resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.« less
A python framework for environmental model uncertainty analysis

USGS Publications Warehouse

White, Jeremy; Fienen, Michael N.; Doherty, John E.

2016-01-01

We have developed pyEMU, a python framework for Environmental Modeling Uncertainty analyses, open-source tool that is non-intrusive, easy-to-use, computationally efficient, and scalable to highly-parameterized inverse problems. The framework implements several types of linear (first-order, second-moment (FOSM)) and non-linear uncertainty analyses. The FOSM-based analyses can also be completed prior to parameter estimation to help inform important modeling decisions, such as parameterization and objective function formulation. Complete workflows for several types of FOSM-based and non-linear analyses are documented in example notebooks implemented using Jupyter that are available in the online pyEMU repository. Example workflows include basic parameter and forecast analyses, data worth analyses, and error-variance analyses, as well as usage of parameter ensemble generation and management capabilities. These workflows document the necessary steps and provides insights into the results, with the goal of educating users not only in how to apply pyEMU, but also in the underlying theory of applied uncertainty quantification.
Applying Content Management to Automated Provenance Capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.

2008-04-10

Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research.

PubMed

Yaniv, Ziv; Lowekamp, Bradley C; Johnson, Hans J; Beare, Richard

2018-06-01

Modern scientific endeavors increasingly require team collaborations to construct and interpret complex computational workflows. This work describes an image-analysis environment that supports the use of computational tools that facilitate reproducible research and support scientists with varying levels of software development skills. The Jupyter notebook web application is the basis of an environment that enables flexible, well-documented, and reproducible workflows via literate programming. Image-analysis software development is made accessible to scientists with varying levels of programming experience via the use of the SimpleITK toolkit, a simplified interface to the Insight Segmentation and Registration Toolkit. Additional features of the development environment include user friendly data sharing using online data repositories and a testing framework that facilitates code maintenance. SimpleITK provides a large number of examples illustrating educational and research-oriented image analysis workflows for free download from GitHub under an Apache 2.0 license: github.com/InsightSoftwareConsortium/SimpleITK-Notebooks .
Cyberinfrastructure for End-to-End Environmental Explorations

NASA Astrophysics Data System (ADS)

Merwade, V.; Kumar, S.; Song, C.; Zhao, L.; Govindaraju, R.; Niyogi, D.

2007-12-01

The design and implementation of a cyberinfrastructure for End-to-End Environmental Exploration (C4E4) is presented. The C4E4 framework addresses the need for an integrated data/computation platform for studying broad environmental impacts by combining heterogeneous data resources with state-of-the-art modeling and visualization tools. With Purdue being a TeraGrid Resource Provider, C4E4 builds on top of the Purdue TeraGrid data management system and Grid resources, and integrates them through a service-oriented workflow system. It allows researchers to construct environmental workflows for data discovery, access, transformation, modeling, and visualization. Using the C4E4 framework, we have implemented an end-to-end SWAT simulation and analysis workflow that connects our TeraGrid data and computation resources. It enables researchers to conduct comprehensive studies on the impact of land management practices in the St. Joseph watershed using data from various sources in hydrologic, atmospheric, agricultural, and other related disciplines.
Making the most of cloud storage - a toolkit for exploitation by WLCG experiments

NASA Astrophysics Data System (ADS)

Alvarez Ayllon, Alejandro; Arsuaga Rios, Maria; Bitzes, Georgios; Furano, Fabrizio; Keeble, Oliver; Manzi, Andrea

2017-10-01

Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important consideration for WLCG. We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid infrastructures and workflows. Notable updates include support for a number of object store flavours in FTS3, Davix and gfal2, including mitigations for lack of vector reads; the extension of Dynafed to operate as a bridge between grid and cloud domains; protocol translation in FTS3; the implementation of extensions to DPM (also implemented by the dCache project) to allow 3rd party transfers over HTTP. The result is a toolkit which facilitates data movement and access between grid and cloud infrastructures, broadening the range of workflows suitable for cloud. We report on deployment scenarios and prototype experience, explaining how, for example, an Amazon S3 or Azure allocation can be exploited by grid workflows.
A microseismic workflow for managing induced seismicity risk as CO 2 storage projects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matzel, E.; Morency, C.; Pyle, M.

2015-10-27

It is well established that fluid injection has the potential to induce earthquakes—from microseismicity to large, damaging events—by altering state-of-stress conditions in the subsurface. While induced seismicity has not been a major operational issue for carbon storage projects to date, a seismicity hazard exists and must be carefully addressed. Two essential components of effective seismic risk management are (1) sensitive microseismic monitoring and (2) robust data interpretation tools. This report describes a novel workflow, based on advanced processing algorithms applied to microseismic data, to help improve management of seismic risk. This workflow has three main goals: (1) to improve themore » resolution and reliability of passive seismic monitoring, (2) to extract additional, valuable information from continuous waveform data that is often ignored in standard processing, and (3) to minimize the turn-around time between data collection, interpretation, and decision-making. These three objectives can allow for a better-informed and rapid response to changing subsurface conditions.« less
Use of Electronic Health Record Tools to Facilitate and Audit Infliximab Prescribing.

PubMed

Sharpless, Bethany R; Del Rosario, Fernando; Molle-Rios, Zarela; Hilmas, Elora

2018-01-01

The objective of this project was to assess a pediatric institution's use of infliximab and develop and evaluate electronic health record tools to improve safety and efficiency of infliximab ordering through auditing and improved communication. Best use of infliximab was defined through a literature review, analysis of baseline use of infliximab at our institution, and distribution and analysis of a national survey. Auditing and order communication were optimized through implementation of mandatory indications in the infliximab orderable and creation of an interactive flowsheet that collects discrete and free-text data. The value of the implemented electronic health record tools was assessed at the conclusion of the project. Baseline analysis determined that 93.8% of orders were dosed appropriately according to the findings of a literature review. After implementation of the flowsheet and indications, the time to perform an audit of use was reduced from 60 minutes to 5 minutes per month. Four months post implementation, data were entered by 60% of the pediatric gastroenterologists at our institution on 15.3% of all encounters for infliximab. Users were surveyed on the value of the tools, with 100% planning to continue using the workflow, and 82% stating the tools frequently improve the efficiency and safety of infliximab prescribing. Creation of a standard workflow by using an interactive flowsheet has improved auditing ability and facilitated the communication of important order information surrounding infliximab. Providers and pharmacists feel these tools improve the safety and efficiency of infliximab ordering, and auditing data reveal that the tools are being used.
Impact of electronic medical record integration of a handoff tool on sign-out in a newborn intensive care unit

PubMed Central

Palma, JP; Sharek, PJ; Longhurst, CA

2016-01-01

Objective To evaluate the impact of integrating a handoff tool into the electronic medical record (EMR) on sign-out accuracy, satisfaction and workflow in a neonatal intensive care unit (NICU). Study Design Prospective surveys of neonatal care providers in an academic children’s hospital 1 month before and 6 months following EMR integration of a standalone Microsoft Access neonatal handoff tool. Result Providers perceived sign-out information to be somewhat or very accurate at a rate of 78% with the standalone handoff tool and 91% with the EMR-integrated tool (P < 0.01). Before integration of neonatal sign-out into the EMR, 35% of providers were satisfied with the process of updating sign-out information and 71% were satisfied with the printed sign-out document; following EMR integration, 92% of providers were satisfied with the process of updating sign-out information (P < 0.01) and 98% were satisfied with the printed sign-out document (P < 0.01). Neonatal care providers reported spending a median of 11 to 15 min/day updating the standalone sign-out and 16 to 20 min/day updating the EMR-integrated sign-out (P = 0.026). The median percentage of total sign-out preparation time dedicated to transcribing information from the EMR was 25 to 49% before and <25% after EMR integration of the handoff tool (P < 0.01). Conclusion Integration of a NICU-specific handoff tool into an EMR resulted in improvements in perceived sign-out accuracy, provider satisfaction and at least one aspect of workflow. PMID:21273990
Desktop Publishing in the University: Current Progress, Future Visions.

ERIC Educational Resources Information Center

Smith, Thomas W.

1989-01-01

Discussion of the workflow involved in desktop publishing focuses on experiences at the College of Engineering at the University of Wisconsin at Madison. Highlights include cost savings and productivity gains in page layout and composition; editing, translation, and revision issues; printing and distribution; and benefits to the reader. (LRW)
Building a Generic Virtual Research Environment Framework for Multiple Earth and Space Science Domains and a Diversity of Users.

NASA Astrophysics Data System (ADS)

Wyborn, L. A.; Fraser, R.; Evans, B. J. K.; Friedrich, C.; Klump, J. F.; Lescinsky, D. T.

2017-12-01

Virtual Research Environments (VREs) are now part of academic infrastructures. Online research workflows can be orchestrated whereby data can be accessed from multiple external repositories with processing taking place on public or private clouds, and centralised supercomputers using a mixture of user codes, and well-used community software and libraries. VREs enable distributed members of research teams to actively work together to share data, models, tools, software, workflows, best practices, infrastructures, etc. These environments and their components are increasingly able to support the needs of undergraduate teaching. External to the research sector, they can also be reused by citizen scientists, and be repurposed for industry users to help accelerate the diffusion and hence enable the translation of research innovations. The Virtual Geophysics Laboratory (VGL) in Australia was started in 2012, built using a collaboration between CSIRO, the National Computational Infrastructure (NCI) and Geoscience Australia, with support funding from the Australian Government Department of Education. VGL comprises three main modules that provide an interface to enable users to first select their required data; to choose a tool to process that data; and then access compute infrastructure for execution. VGL was initially built to enable a specific set of researchers in government agencies access to specific data sets and a limited number of tools. Over the years it has evolved into a multi-purpose Earth science platform with access to an increased variety of data (e.g., Natural Hazards, Geochemistry), a broader range of software packages, and an increasing diversity of compute infrastructures. This expansion has been possible because of the approach to loosely couple data, tools and compute resources via interfaces that are built on international standards and accessed as network-enabled services wherever possible. Built originally for researchers that were not fussy about general usability, increasing emphasis on User Interfaces (UIs) and stability will lead to increased uptake in the education and industry sectors. Simultaneously, improvements are being added to facilitate access to data and tools by experienced researchers who want direct access to both data and flexible workflows.
Developing a Motivational Strategy.

ERIC Educational Resources Information Center

Janson, Robert

1979-01-01

Describes the use of job enrichment techniques as tools for increased productivity and organizational change. The author's motivational work design model changes not only the job design but also structural elements such as physical layout, workflow, and organizational relationships. Behavior change is more important than job enrichment. (MF)
An Accessible Proteogenomics Informatics Resource for Cancer Researchers.

PubMed

Chambers, Matthew C; Jagtap, Pratik D; Johnson, James E; McGowan, Thomas; Kumar, Praveen; Onsongo, Getiria; Guerrero, Candace R; Barsnes, Harald; Vaudel, Marc; Martens, Lennart; Grüning, Björn; Cooke, Ira R; Heydarian, Mohammad; Reddy, Karen L; Griffin, Timothy J

2017-11-01

Proteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry-based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics. Our resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: (i) generation of customized, annotated protein sequence databases from RNA-Seq data; and (ii) accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty. Directions for accessing software tools and workflows, along with instructional documentation, can be found at z.umn.edu/canresgithub. Cancer Res; 77(21); e43-46. ©2017 AACR . ©2017 American Association for Cancer Research.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowen, Benjamin; Ruebel, Oliver; Fischer, Curt Fischer R.

BASTet is an advanced software library written in Python. BASTet serves as the analysis and storage library for the OpenMSI project. BASTet is an integrate framework for: i) storage of spectral imaging data, ii) storage of derived analysis data, iii) provenance of analyses, iv) integration and execution of analyses via complex workflows. BASTet implements the API for the HDF5 storage format used by OpenMSI. Analyses that are developed using BASTet benefit from direct integration with storage format, automatic tracking of provenance, and direct integration with command-line and workflow execution tools. BASTet also defines interfaces to enable developers to directly integratemore » their analysis with OpenMSI's web-based viewing infrastruture without having to know OpenMSI. BASTet also provides numerous helper classes and tools to assist with the conversion of data files, ease parallel implementation of analysis algorithms, ease interaction with web-based functions, description methods for data reduction. BASTet also includes detailed developer documentation, user tutorials, iPython notebooks, and other supporting documents.« less
Performances of the PIPER scalable child human body model in accident reconstruction

PubMed Central

Giordano, Chiara; Kleiven, Svein

2017-01-01

Human body models (HBMs) have the potential to provide significant insights into the pediatric response to impact. This study describes a scalable/posable approach to perform child accident reconstructions using the Position and Personalize Advanced Human Body Models for Injury Prediction (PIPER) scalable child HBM of different ages and in different positions obtained by the PIPER tool. Overall, the PIPER scalable child HBM managed reasonably well to predict the injury severity and location of the children involved in real-life crash scenarios documented in the medical records. The developed methodology and workflow is essential for future work to determine child injury tolerances based on the full Child Advanced Safety Project for European Roads (CASPER) accident reconstruction database. With the workflow presented in this study, the open-source PIPER scalable HBM combined with the PIPER tool is also foreseen to have implications for improved safety designs for a better protection of children in traffic accidents. PMID:29135997
The Image Gently pediatric digital radiography safety checklist: tools for improving pediatric radiography.

PubMed

John, Susan D; Moore, Quentin T; Herrmann, Tracy; Don, Steven; Powers, Kevin; Smith, Susan N; Morrison, Greg; Charkot, Ellen; Mills, Thalia T; Rutz, Lois; Goske, Marilyn J

2013-10-01

Transition from film-screen to digital radiography requires changes in radiographic technique and workflow processes to ensure that the minimum radiation exposure is used while maintaining diagnostic image quality. Checklists have been demonstrated to be useful tools for decreasing errors and improving safety in several areas, including commercial aviation and surgical procedures. The Image Gently campaign, through a competitive grant from the FDA, developed a checklist for technologists to use during the performance of digital radiography in pediatric patients. The checklist outlines the critical steps in digital radiography workflow, with an emphasis on steps that affect radiation exposure and image quality. The checklist and its accompanying implementation manual and practice quality improvement project are open source and downloadable at www.imagegently.org. The authors describe the process of developing and testing the checklist and offer suggestions for using the checklist to minimize radiation exposure to children during radiography. Copyright © 2013 American College of Radiology. All rights reserved.
WHAM!: a web-based visualization suite for user-defined analysis of metagenomic shotgun sequencing data.

PubMed

Devlin, Joseph C; Battaglia, Thomas; Blaser, Martin J; Ruggles, Kelly V

2018-06-25

Exploration of large data sets, such as shotgun metagenomic sequence or expression data, by biomedical experts and medical professionals remains as a major bottleneck in the scientific discovery process. Although tools for this purpose exist for 16S ribosomal RNA sequencing analysis, there is a growing but still insufficient number of user-friendly interactive visualization workflows for easy data exploration and figure generation. The development of such platforms for this purpose is necessary to accelerate and streamline microbiome laboratory research. We developed the Workflow Hub for Automated Metagenomic Exploration (WHAM!) as a web-based interactive tool capable of user-directed data visualization and statistical analysis of annotated shotgun metagenomic and metatranscriptomic data sets. WHAM! includes exploratory and hypothesis-based gene and taxa search modules for visualizing differences in microbial taxa and gene family expression across experimental groups, and for creating publication quality figures without the need for command line interface or in-house bioinformatics. WHAM! is an interactive and customizable tool for downstream metagenomic and metatranscriptomic analysis providing a user-friendly interface allowing for easy data exploration by microbiome and ecological experts to facilitate discovery in multi-dimensional and large-scale data sets.

A Web application for the management of clinical workflow in image-guided and adaptive proton therapy for prostate cancer treatments.

PubMed

Yeung, Daniel; Boes, Peter; Ho, Meng Wei; Li, Zuofeng

2015-05-08

Image-guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X-rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post-treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state-of-the-art Web technologies, a domain model closely matching the workflow, a database-supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model-View-Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client-side technologies, such as jQuery, jQuery Plug-ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process.
From chart tracking to workflow management.

PubMed Central

Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.

1994-01-01

The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051
gProcess and ESIP Platforms for Satellite Imagery Processing over the Grid

NASA Astrophysics Data System (ADS)

Bacu, Victor; Gorgan, Dorian; Rodila, Denisa; Pop, Florin; Neagu, Gabriel; Petcu, Dana

2010-05-01

The Environment oriented Satellite Data Processing Platform (ESIP) is developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) co-funded by the European Commission through FP7 [1]. The gProcess Platform [2] is a set of tools and services supporting the development and the execution over the Grid of the workflow based processing, and particularly the satelite imagery processing. The ESIP [3], [4] is build on top of the gProcess platform by adding a set of satellite image processing software modules and meteorological algorithms. The satellite images can reveal and supply important information on earth surface parameters, climate data, pollution level, weather conditions that can be used in different research areas. Generally, the processing algorithms of the satellite images can be decomposed in a set of modules that forms a graph representation of the processing workflow. Two types of workflows can be defined in the gProcess platform: abstract workflow (PDG - Process Description Graph), in which the user defines conceptually the algorithm, and instantiated workflow (iPDG - instantiated PDG), which is the mapping of the PDG pattern on particular satellite image and meteorological data [5]. The gProcess platform allows the definition of complex workflows by combining data resources, operators, services and sub-graphs. The gProcess platform is developed for the gLite middleware that is available in EGEE and SEE-GRID infrastructures [6]. gProcess exposes the specific functionality through web services [7]. The Editor Web Service retrieves information on available resources that are used to develop complex workflows (available operators, sub-graphs, services, supported resources, etc.). The Manager Web Service deals with resources management (uploading new resources such as workflows, operators, services, data, etc.) and in addition retrieves information on workflows. The Executor Web Service manages the execution of the instantiated workflows on the Grid infrastructure. In addition, this web service monitors the execution and generates statistical data that are important to evaluate performances and to optimize execution. The Viewer Web Service allows access to input and output data. To prove and to validate the utility of the gProcess and ESIP platforms there were developed the GreenView and GreenLand applications. The GreenView related functionality includes the refinement of some meteorological data such as temperature, and the calibration of the satellite images based on field measurements. The GreenLand application performs the classification of the satellite images by using a set of vegetation indices. The gProcess and ESIP platforms are used as well in GiSHEO project [8] to support the processing of Earth Observation data over the Grid in eGLE (GiSHEO eLearning Environment). Experiments of performance assessment were conducted and they have revealed that the workflow-based execution could improve the execution time of a satellite image processing algorithm [9]. It is not a reliable solution to execute all the workflow nodes on different machines. The execution of some nodes can be more time consuming and they will be performed in a longer time than other nodes. The total execution time will be affected because some nodes will slow down the execution. It is important to correctly balance the workflow nodes. Based on some optimization strategy the workflow nodes can be grouped horizontally, vertically or in a hybrid approach. In this way, those operators will be executed on one machine and also the data transfer between workflow nodes will be lower. The dynamic nature of the Grid infrastructure makes it more exposed to the occurrence of failures. These failures can occur at worker node, services availability, storage element, etc. Currently gProcess has support for some basic error prevention and error management solutions. In future, some more advanced error prevention and management solutions will be integrated in the gProcess platform. References [1] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [2] Bacu V., Stefanut T., Rodila D., Gorgan D., Process Description Graph Composition by gProcess Platform. HiPerGRID - 3rd International Workshop on High Performance Grid Middleware, 28 May, Bucharest. Proceedings of CSCS-17 Conference, Vol.2., ISSN 2066-4451, pp. 423-430, (2009). [3] ESIP Platform, http://wiki.egee-see.org/index.php/JRA1_Commonalities [4] Gorgan D., Bacu V., Rodila D., Pop Fl., Petcu D., Experiments on ESIP - Environment oriented Satellite Data Processing Platform. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 157-166 (2009). [5] Radu, A., Bacu, V., Gorgan, D., Diagrammatic Description of Satellite Image Processing Workflow. Workshop on Grid Computing Applications Development (GridCAD) at the SYNASC Symposium, 28 September 2007, Timisoara, IEEE Computer Press, ISBN 0-7695-3078-8, 2007, pp. 341-348 (2007). [6] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [7] Rodila D., Bacu V., Gorgan D., Integration of Satellite Image Operators as Workflows in the gProcess Application. Proceedings of ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27-29 Aug, 2009 Cluj-Napoca. ISBN: 978-1-4244-5007-7, pp. 355-358 (2009). [8] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [9] Bacu V., Gorgan D., Graph Based Evaluation of Satellite Imagery Processing over Grid. ISPDC 2008 - 7th International Symposium on Parallel and Distributed Computing, July 1-5, 2008, Krakow, Poland. IEEE Computer Society 2008, ISBN: 978-0-7695-3472-5, pp. 147-154.
Lessons from Implementing a Combined Workflow–Informatics System for Diabetes Management

PubMed Central

Zai, Adrian H.; Grant, Richard W.; Estey, Greg; Lester, William T.; Andrews, Carl T.; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C.

2008-01-01

Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease. PMID:18436907
Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.

PubMed

Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh

2018-01-01

Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.
Case Report: Activity Diagrams for Integrating Electronic Prescribing Tools into Clinical Workflow

PubMed Central

Johnson, Kevin B.; FitzHenry, Fern

2006-01-01

To facilitate the future implementation of an electronic prescribing system, this case study modeled prescription management processes in various primary care settings. The Vanderbilt e-prescribing design team conducted initial interviews with clinic managers, physicians and nurses, and then represented the sequences of steps carried out to complete prescriptions in activity diagrams. The diagrams covered outpatient prescribing for patients during a clinic visit and between clinic visits. Practice size, practice setting, and practice specialty type influenced the prescribing processes used. The model developed may be useful to others engaged in building or tailoring an e-prescribing system to meet the specific workflows of various clinic settings. PMID:16622168
Assessing Ongoing Electronic Resource Purchases: Linking Tools to Synchronize Staff Workflows

ERIC Educational Resources Information Center

Carroll, Jeffrey D.; Major, Colleen; O'Neal, Nada; Tofanelli, John

2012-01-01

Ongoing electronic resource purchases represent a substantial proportion of collections budgets. Recognizing the necessity of systematic ongoing assessment with full selector engagement, Columbia University Libraries appointed an Electronic Resources Assessment Working Group to promote the inclusion of such resources within our current culture of…
Improving hydrologic disaster forecasting and response for transportation by assimilating and fusing NASA and other data sets : final report.

DOT National Transportation Integrated Search

2017-04-15

In this 3-year project, the research team developed the Hydrologic Disaster Forecast and Response (HDFR) system, a set of integrated software tools for end users that streamlines hydrologic prediction workflows involving automated retrieval of hetero...
ASCEM Data Brower (ASCEMDB) v0.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

ROMOSAN, ALEXANDRU

Data management tool designed for the Advanced Simulation Capability for Environmental Management (ASCEM) framework. Distinguishing features of this gateway include: (1) handling of complex geometry data, (2) advance selection mechanism, (3) state of art rendering of spatiotemporal data records, and (4) seamless integration with a distributed workflow engine.
NASA Tech Briefs, November 2013

NASA Technical Reports Server (NTRS)

2013-01-01

Topics include: Cryogenic Liquid Sample Acquisition System for Remote Space Applications; 5 Spatial Statistical Data Fusion (SSDF); GPS Estimates of Integrated Precipitable Water Aid Weather Forecasters; Integrating a Microwave Radiometer into Radar Hardware for Simultaneous Data Collection Between the Instruments; Rapid Detection of Herpes Viruses for Clinical Applications; High-Speed Data Recorder for Space, Geodesy, and Other High-Speed Recording Applications; Datacasting V3.0; An All-Solid-State, Room-Temperature, Heterodyne Receiver for Atmospheric Spectroscopy at 1.2 THz; Stacked Transformer for Driver Gain and Receive Signal Splitting; Wireless Integrated Microelectronic Vacuum Sensor System; Fabrication Method for LOBSTER-Eye Optics in <110> Silicon; Compact Focal Plane Assembly for Planetary Science; Fabrication Methods for Adaptive Deformable Mirrors; Visiting Vehicle Ground Trajectory Tool; Workflow-Based Software Development Environment; Mobile Thread Task Manager; A Kinematic Calibration Process for Flight Robotic Arms; Magnetostrictive Alternator; Bulk Metallic Glasses and Composites for Optical and Compliant Mechanisms; Detection of Only Viable Bacterial Spores Using a Live/Dead Indicator in Mixed Populations; and Intravenous Fluid Generation System.
Dynamic Voltage Frequency Scaling Simulator for Real Workflows Energy-Aware Management in Green Cloud Computing

PubMed Central

Cotes-Ruiz, Iván Tomás; Prado, Rocío P.; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás

2017-01-01

Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique. PMID:28085932
Dynamic Voltage Frequency Scaling Simulator for Real Workflows Energy-Aware Management in Green Cloud Computing.

PubMed

Cotes-Ruiz, Iván Tomás; Prado, Rocío P; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás

2017-01-01

Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique.
Monitoring of computing resource use of active software releases at ATLAS

NASA Astrophysics Data System (ADS)

Limosani, Antonio; ATLAS Collaboration

2017-10-01

The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows.
Information Engineering and Workflow Design in a Clinical Decision Support System for Colorectal Cancer Screening in Iran.

PubMed

Maserat, Elham; Seied Farajollah, Seiede Sedigheh; Safdari, Reza; Ghazisaeedi, Marjan; Aghdaei, Hamid Asadzadeh; Zali, Mohammad Reza

2015-01-01

Colorectal cancer is a major cause of morbidity and mortality throughout the world. Colorectal cancer screening is an optimal way for reducing of morbidity and mortality and a clinical decision support system (CDSS) plays an important role in predicting success of screening processes. DSS is a computer-based information system that improves the delivery of preventive care services. The aim of this article was to detail engineering of information requirements and work flow design of CDSS for a colorectal cancer screening program. In the first stage a screening minimum data set was determined. Developed and developing countries were analyzed for identifying this data set. Then information deficiencies and gaps were determined by check list. The second stage was a qualitative survey with a semi-structured interview as the study tool. A total of 15 users and stakeholders' perspectives about workflow of CDSS were studied. Finally workflow of DSS of control program was designed by standard clinical practice guidelines and perspectives. Screening minimum data set of national colorectal cancer screening program was defined in five sections, including colonoscopy data set, surgery, pathology, genetics and pedigree data set. Deficiencies and information gaps were analyzed. Then we designed a work process standard of screening. Finally workflow of DSS and entry stage were determined. A CDSS facilitates complex decision making for screening and has key roles in designing optimal interactions between colonoscopy, pathology and laboratory departments. Also workflow analysis is useful to identify data reconciliation strategies to address documentation gaps. Following recommendations of CDSS should improve quality of colorectal cancer screening.
REDLetr: Workflow and tools to support the migration of legacy clinical data capture systems to REDCap

PubMed Central

Dunn, William D; Cobb, Jake; Levey, Allan I; Gutman, David A

2017-01-01

Objective A memory clinic at an academic medical center has relied on several ad hoc data capture systems including Microsoft Access and Excel for cognitive assessments over the last several years. However these solutions are challenging to maintain and limit the potential of hypothesis-driven or longitudinal research. REDCap, a secure web application based on php and MySQL, is a practical solution for improving data capture and organization. Here, we present a workflow and toolset to facilitate legacy data migration and real-time clinical research data collection into REDCap as well as challenges encountered. Materials and Methods Legacy data consisted of neuropsychological tests stored in over 4,000 Excel workbooks. Functions for data extraction, norm scoring, converting to REDCap-compatible formats, accessing the REDCap API, and clinical report generation were developed and executed in Python. Results Over 400 unique data points for each workbook were migrated and integrated into our REDCap database. Moving forward, our REDCap-based system replaces the Excel-based data collection method as well as eases the integration to the Electronic Health Record. Conclusion In the age of growing data, efficient organization and storage of clinical and research data is critical for advancing research and providing efficient patient care. We believe that the tools and workflow described in this work to promote legacy data integration as well as real time data collection into REDCap ultimately facilitate these goals. PMID:27396629
Lowering the Barriers to Integrative Aquatic Ecosystem Science: Semantic Provenance, Open Linked Data, and Workflows

NASA Astrophysics Data System (ADS)

Harmon, T.; Hofmann, A. F.; Utz, R.; Deelman, E.; Hanson, P. C.; Szekely, P.; Villamizar, S. R.; Knoblock, C.; Guo, Q.; Crichton, D. J.; McCann, M. P.; Gil, Y.

2011-12-01

Environmental cyber-observatory (ECO) planning and implementation has been ongoing for more than a decade now, and several major efforts have recently come online or will soon. Some investigators in the relevant research communities will use ECO data, traditionally by developing their own client-side services to acquire data and then manually create custom tools to integrate and analyze it. However, a significant portion of the aquatic ecosystem science community will need more custom services to manage locally collected data. The latter group represents enormous intellectual capacity when one envisions thousands of ecosystems scientists supplementing ECO baseline data by sharing their own locally intensive observational efforts. This poster summarizes the outcomes of the June 2011 Workshop for Aquatic Ecosystem Sustainability (WAES) which focused on the needs of aquatic ecosystem research on inland waters and oceans. Here we advocate new approaches to support scientists to model, integrate, and analyze data based on: 1) a new breed of software tools in which semantic provenance is automatically created and used by the system, 2) the use of open standards based on RDF and Linked Data Principles to facilitate sharing of data and provenance annotations, 3) the use of workflows to represent explicitly all data preparation, integration, and processing steps in a way that is automatically repeatable. Aquatic ecosystems workflow exemplars are provided and discussed in terms of their potential broaden data sharing, analysis and synthesis thereby increasing the impact of aquatic ecosystem research.
Dawn: A Simulation Model for Evaluating Costs and Tradeoffs of Big Data Science Architectures

NASA Astrophysics Data System (ADS)

Cinquini, L.; Crichton, D. J.; Braverman, A. J.; Kyo, L.; Fuchs, T.; Turmon, M.

2014-12-01

In many scientific disciplines, scientists and data managers are bracing for an upcoming deluge of big data volumes, which will increase the size of current data archives by a factor of 10-100 times. For example, the next Climate Model Inter-comparison Project (CMIP6) will generate a global archive of model output of approximately 10-20 Peta-bytes, while the upcoming next generation of NASA decadal Earth Observing instruments are expected to collect tens of Giga-bytes/day. In radio-astronomy, the Square Kilometre Array (SKA) will collect data in the Exa-bytes/day range, of which (after reduction and processing) around 1.5 Exa-bytes/year will be stored. The effective and timely processing of these enormous data streams will require the design of new data reduction and processing algorithms, new system architectures, and new techniques for evaluating computation uncertainty. Yet at present no general software tool or framework exists that will allow system architects to model their expected data processing workflow, and determine the network, computational and storage resources needed to prepare their data for scientific analysis. In order to fill this gap, at NASA/JPL we have been developing a preliminary model named DAWN (Distributed Analytics, Workflows and Numerics) for simulating arbitrary complex workflows composed of any number of data processing and movement tasks. The model can be configured with a representation of the problem at hand (the data volumes, the processing algorithms, the available computing and network resources), and is able to evaluate tradeoffs between different possible workflows based on several estimators: overall elapsed time, separate computation and transfer times, resulting uncertainty, and others. So far, we have been applying DAWN to analyze architectural solutions for 4 different use cases from distinct science disciplines: climate science, astronomy, hydrology and a generic cloud computing use case. This talk will present preliminary results and discuss how DAWN can be evolved into a powerful tool for designing system architectures for data intensive science.
Flexible Early Warning Systems with Workflows and Decision Tables

NASA Astrophysics Data System (ADS)

Riedel, F.; Chaves, F.; Zeiner, H.

2012-04-01

An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows are usually only suited for rigid processes. We show how improvements can be achieved by using decision tables and rule-based adaptive workflows. Decision tables have been shown to be an intuitive tool that can be used by domain experts to express rule sets that can be interpreted automatically at runtime. Adaptive workflows use a rule-based approach to increase the flexibility of workflows by providing mechanisms to adapt workflows based on context changes, human intervention and availability of services. The combination of workflows, decision tables and rule-based adaption creates a framework that opens up new possibilities for flexible and adaptable workflows, especially, for use in early warning and crisis management systems.
LATUX: An Iterative Workflow for Designing, Validating, and Deploying Learning Analytics Visualizations

ERIC Educational Resources Information Center

Martinez-Maldonado, Roberto; Pardo, Abelardo; Mirriahi, Negin; Yacef, Kalina; Kay, Judy; Clayphan, Andrew

2015-01-01

Designing, validating, and deploying learning analytics tools for instructors or students is a challenge that requires techniques and methods from different disciplines, such as software engineering, human-computer interaction, computer graphics, educational design, and psychology. Whilst each has established its own design methodologies, we now…
Curriculum Online Review System: Proposing Curriculum with Collaboration

ERIC Educational Resources Information Center

Rhinehart, Marilyn; Barlow, Rhonda; Shafer, Stu; Hassur, Debby

2009-01-01

The Curriculum Online Review System (CORS) at Johnson County Community College (JCCC) uses SharePoint as a Web platform for the JCCC Curriculum Proposals Process. The CORS application manages proposals throughout the approval process using collaboration tools and workflows to notify all stakeholders. This innovative new program has changed the way…

A big data approach for climate change indicators processing in the CLIP-C project

NASA Astrophysics Data System (ADS)

D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni

2016-04-01

Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.
Integrate Data into Scientific Workflows for Terrestrial Biosphere Model Evaluation through Brokers

NASA Astrophysics Data System (ADS)

Wei, Y.; Cook, R. B.; Du, F.; Dasgupta, A.; Poco, J.; Huntzinger, D. N.; Schwalm, C. R.; Boldrini, E.; Santoro, M.; Pearlman, J.; Pearlman, F.; Nativi, S.; Khalsa, S.

2013-12-01

Terrestrial biosphere models (TBMs) have become integral tools for extrapolating local observations and process-level understanding of land-atmosphere carbon exchange to larger regions. Model-model and model-observation intercomparisons are critical to understand the uncertainties within model outputs, to improve model skill, and to improve our understanding of land-atmosphere carbon exchange. The DataONE Exploration, Visualization, and Analysis (EVA) working group is evaluating TBMs using scientific workflows in UV-CDAT/VisTrails. This workflow-based approach promotes collaboration and improved tracking of evaluation provenance. But challenges still remain. The multi-scale and multi-discipline nature of TBMs makes it necessary to include diverse and distributed data resources in model evaluation. These include, among others, remote sensing data from NASA, flux tower observations from various organizations including DOE, and inventory data from US Forest Service. A key challenge is to make heterogeneous data from different organizations and disciplines discoverable and readily integrated for use in scientific workflows. This presentation introduces the brokering approach taken by the DataONE EVA to fill the gap between TBMs' evaluation scientific workflows and cross-organization and cross-discipline data resources. The DataONE EVA started the development of an Integrated Model Intercomparison Framework (IMIF) that leverages standards-based discovery and access brokers to dynamically discover, access, and transform (e.g. subset and resampling) diverse data products from DataONE, Earth System Grid (ESG), and other data repositories into a format that can be readily used by scientific workflows in UV-CDAT/VisTrails. The discovery and access brokers serve as an independent middleware that bridge existing data repositories and TBMs evaluation scientific workflows but introduce little overhead to either component. In the initial work, an OpenSearch-based discovery broker is leveraged to provide a consistent mechanism for data discovery. Standards-based data services, including Open Geospatial Consortium (OGC) Web Coverage Service (WCS) and THREDDS are leveraged to provide on-demand data access and transformations through the data access broker. To ease the adoption of broker services, a package of broker client VisTrails modules have been developed to be easily plugged into scientific workflows. The initial IMIF has been successfully tested in selected model evaluation scenarios involved in the NASA-funded Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP).
Biological standards for the Knowledge-Based BioEconomy: What is at stake.

PubMed

de Lorenzo, Víctor; Schmidt, Markus

2018-01-25

The contribution of life sciences to the Knowledge-Based Bioeconomy (KBBE) asks for the transition of contemporary, gene-based biotechnology from being a trial-and-error endeavour to becoming an authentic branch of engineering. One requisite to this end is the need for standards to measure and represent accurately biological functions, along with languages for data description and exchange. However, the inherent complexity of biological systems and the lack of quantitative tradition in the field have largely curbed this enterprise. Fortunately, the onset of systems and synthetic biology has emphasized the need for standards not only to manage omics data, but also to increase reproducibility and provide the means of engineering living systems in earnest. Some domains of biotechnology can be easily standardized (e.g. physical composition of DNA sequences, tools for genome editing, languages to encode workflows), while others might be standardized with some dedicated research (e.g. biological metrology, operative systems for bio-programming cells) and finally others will require a considerable effort, e.g. defining the rules that allow functional composition of biological activities. Despite difficulties, these are worthy attempts, as the history of technology shows that those who set/adopt standards gain a competitive advantage over those who do not. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluation of user interface and workflow design of a bedside nursing clinical decision support system.

PubMed

Yuan, Michael Juntao; Finley, George Mike; Long, Ju; Mills, Christy; Johnson, Ron Kim

2013-01-31

Clinical decision support systems (CDSS) are important tools to improve health care outcomes and reduce preventable medical adverse events. However, the effectiveness and success of CDSS depend on their implementation context and usability in complex health care settings. As a result, usability design and validation, especially in real world clinical settings, are crucial aspects of successful CDSS implementations. Our objective was to develop a novel CDSS to help frontline nurses better manage critical symptom changes in hospitalized patients, hence reducing preventable failure to rescue cases. A robust user interface and implementation strategy that fit into existing workflows was key for the success of the CDSS. Guided by a formal usability evaluation framework, UFuRT (user, function, representation, and task analysis), we developed a high-level specification of the product that captures key usability requirements and is flexible to implement. We interviewed users of the proposed CDSS to identify requirements, listed functions, and operations the system must perform. We then designed visual and workflow representations of the product to perform the operations. The user interface and workflow design were evaluated via heuristic and end user performance evaluation. The heuristic evaluation was done after the first prototype, and its results were incorporated into the product before the end user evaluation was conducted. First, we recruited 4 evaluators with strong domain expertise to study the initial prototype. Heuristic violations were coded and rated for severity. Second, after development of the system, we assembled a panel of nurses, consisting of 3 licensed vocational nurses and 7 registered nurses, to evaluate the user interface and workflow via simulated use cases. We recorded whether each session was successfully completed and its completion time. Each nurse was asked to use the National Aeronautics and Space Administration (NASA) Task Load Index to self-evaluate the amount of cognitive and physical burden associated with using the device. A total of 83 heuristic violations were identified in the studies. The distribution of the heuristic violations and their average severity are reported. The nurse evaluators successfully completed all 30 sessions of the performance evaluations. All nurses were able to use the device after a single training session. On average, the nurses took 111 seconds (SD 30 seconds) to complete the simulated task. The NASA Task Load Index results indicated that the work overhead on the nurses was low. In fact, most of the burden measures were consistent with zero. The only potentially significant burden was temporal demand, which was consistent with the primary use case of the tool. The evaluation has shown that our design was functional and met the requirements demanded by the nurses' tight schedules and heavy workloads. The user interface embedded in the tool provided compelling utility to the nurse with minimal distraction.
Diagnostic tool for red blood cell membrane disorders: Assessment of a new generation ektacytometer.

PubMed

Da Costa, Lydie; Suner, Ludovic; Galimand, Julie; Bonnel, Amandine; Pascreau, Tiffany; Couque, Nathalie; Fenneteau, Odile; Mohandas, Narla

2016-01-01

Inherited red blood cell (RBC) membrane disorders, such as hereditary spherocytosis, elliptocytosis and hereditary ovalocytosis, result from mutations in genes encoding various RBC membrane and skeletal proteins. The RBC membrane, a composite structure composed of a lipid bilayer linked to a spectrin/actin-based membrane skeleton, confers upon the RBC unique features of deformability and mechanical stability. The disease severity is primarily dependent on the extent of membrane surface area loss. RBC membrane disorders can be readily diagnosed by various laboratory approaches that include RBC cytology, flow cytometry, ektacytometry, electrophoresis of RBC membrane proteins and genetics. The reference technique for diagnosis of RBC membrane disorders is the osmotic gradient ektacytometry. However, in spite of its recognition as the reference technique, this technique is rarely used as a routine diagnosis tool for RBC membrane disorders due to its limited availability. This may soon change as a new generation of ektacytometer has been recently engineered. In this review, we describe the workflow of the samples shipped to our Hematology laboratory for RBC membrane disorder analysis and the data obtained for a large cohort of French patients presenting with RBC membrane disorders using a newly available version of the ektacytomer. Copyright © 2015 Elsevier Inc. All rights reserved.
Workflow for high-content, individual cell quantification of fluorescent markers from universal microscope data, supported by open source software.

PubMed

Stockwell, Simon R; Mittnacht, Sibylle

2014-12-16

Advances in understanding the control mechanisms governing the behavior of cells in adherent mammalian tissue culture models are becoming increasingly dependent on modes of single-cell analysis. Methods which deliver composite data reflecting the mean values of biomarkers from cell populations risk losing subpopulation dynamics that reflect the heterogeneity of the studied biological system. In keeping with this, traditional approaches are being replaced by, or supported with, more sophisticated forms of cellular assay developed to allow assessment by high-content microscopy. These assays potentially generate large numbers of images of fluorescent biomarkers, which enabled by accompanying proprietary software packages, allows for multi-parametric measurements per cell. However, the relatively high capital costs and overspecialization of many of these devices have prevented their accessibility to many investigators. Described here is a universally applicable workflow for the quantification of multiple fluorescent marker intensities from specific subcellular regions of individual cells suitable for use with images from most fluorescent microscopes. Key to this workflow is the implementation of the freely available Cell Profiler software(1) to distinguish individual cells in these images, segment them into defined subcellular regions and deliver fluorescence marker intensity values specific to these regions. The extraction of individual cell intensity values from image data is the central purpose of this workflow and will be illustrated with the analysis of control data from a siRNA screen for G1 checkpoint regulators in adherent human cells. However, the workflow presented here can be applied to analysis of data from other means of cell perturbation (e.g., compound screens) and other forms of fluorescence based cellular markers and thus should be useful for a wide range of laboratories.
New on-line separation workflow of microbial metabolites via hyphenation of analytical and preparative comprehensive two-dimensional liquid chromatography.

PubMed

Yan, Xia; Wang, Li-Juan; Wu, Zhen; Wu, Yun-Long; Liu, Xiu-Xiu; Chang, Fang-Rong; Fang, Mei-Juan; Qiu, Ying-Kun

2016-10-15

Microbial metabolites represent an important source of bioactive natural products, but always exhibit diverse of chemical structures or complicated chemical composition with low active ingredients content. Traditional separation methods rely mainly on off-line combination of open-column chromatography and preparative high performance liquid chromatography (HPLC). However, the multi-step and prolonged separation procedure might lead to exposure to oxygen and structural transformation of metabolites. In the present work, a new two-dimensional separation workflow for fast isolation and analysis of microbial metabolites from Chaetomium globosum SNSHI-5, a cytotoxic fungus derived from extreme environment. The advantage of this analytical comprehensive two-dimensional liquid chromatography (2D-LC) lies on its ability to analyze the composition of the metabolites, and to optimize the separation conditions for the preparative 2D-LC. Furthermore, gram scale preparative 2D-LC separation of the crude fungus extract could be performed on a medium-pressure liquid chromatograph×preparative high-performance liquid chromatography system, under the optimized condition. Interestingly, 12 cytochalasan derivatives, including two new compounds named cytoglobosin Ab (3) and isochaetoglobosin Db (8), were successfully obtained with high purity in a short period of time. The structures of the isolated metabolites were comprehensively characterized by HR ESI-MS and NMR. To be highlighted, this is the first report on the combination of analytical and preparative 2D-LC for the separation of microbial metabolites. The new workflow exhibited apparent advantages in separation efficiency and sample treatment capacity compared with conventional methods. Copyright © 2016 Elsevier B.V. All rights reserved.
Our path to better science in less time using open data science tools.

PubMed

Lowndes, Julia S Stewart; Best, Benjamin D; Scarborough, Courtney; Afflerbach, Jamie C; Frazier, Melanie R; O'Hara, Casey C; Jiang, Ning; Halpern, Benjamin S

2017-05-23

Reproducibility has long been a tenet of science but has been challenging to achieve-we learned this the hard way when our old approaches proved inadequate to efficiently reproduce our own work. Here we describe how several free software tools have fundamentally upgraded our approach to collaborative research, making our entire workflow more transparent and streamlined. By describing specific tools and how we incrementally began using them for the Ocean Health Index project, we hope to encourage others in the scientific community to do the same-so we can all produce better science in less time.
A UIMA wrapper for the NCBO annotator.

PubMed

Roeder, Christophe; Jonquet, Clement; Shah, Nigam H; Baumgartner, William A; Verspoor, Karin; Hunter, Lawrence

2010-07-15

The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows. This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
METAPHOR: Probability density estimation for machine learning based photometric redshifts

NASA Astrophysics Data System (ADS)

Amaro, V.; Cavuoti, S.; Brescia, M.; Vellucci, C.; Tortora, C.; Longo, G.

2017-06-01

We present METAPHOR (Machine-learning Estimation Tool for Accurate PHOtometric Redshifts), a method able to provide a reliable PDF for photometric galaxy redshifts estimated through empirical techniques. METAPHOR is a modular workflow, mainly based on the MLPQNA neural network as internal engine to derive photometric galaxy redshifts, but giving the possibility to easily replace MLPQNA with any other method to predict photo-z's and their PDF. We present here the results about a validation test of the workflow on the galaxies from SDSS-DR9, showing also the universality of the method by replacing MLPQNA with KNN and Random Forest models. The validation test include also a comparison with the PDF's derived from a traditional SED template fitting method (Le Phare).
Digital pathology in clinical use: where are we now and what is holding us back?

PubMed

Griffin, Jon; Treanor, Darren

2017-01-01

Whole slide imaging is being used increasingly in research applications and in frozen section, consultation and external quality assurance practice. Digital pathology, when integrated with other digital tools such as barcoding, specimen tracking and digital dictation, can be integrated into the histopathology workflow, from specimen accession to report sign-out. These elements can bring about improvements in the safety, quality and efficiency of a histopathology department. The present paper reviews the evidence for these benefits. We then discuss the challenges of implementing a fully digital pathology workflow, including the regulatory environment, validation of whole slide imaging and the evidence for the design of a digital pathology workstation. © 2016 John Wiley & Sons Ltd.
Basics of Confocal Microscopy and the Complexity of Diagnosing Skin Tumors: New Imaging Tools in Clinical Practice, Diagnostic Workflows, Cost-Estimate, and New Trends.

PubMed

Que, Syril Keena T; Grant-Kels, Jane M; Longo, Caterina; Pellacani, Giovanni

2016-10-01

The use of reflectance confocal microscopy (RCM) and other noninvasive imaging devices can potentially streamline clinical care, leading to more precise and efficient management of skin cancer. This article explores the potential role of RCM in cutaneous oncology, as an adjunct to more established techniques of detecting and monitoring for skin cancer, such as dermoscopy and total body photography. Discussed are current barriers to the adoption of RCM, diagnostic workflows and standards of care in the United States and Europe, and medicolegal issues. The potential role of RCM and other similar technological innovations in the enhancement of dermatologic care is evaluated. Copyright © 2016 Elsevier Inc. All rights reserved.
A Web application for the management of clinical workflow in image‐guided and adaptive proton therapy for prostate cancer treatments

PubMed Central

Boes, Peter; Ho, Meng Wei; Li, Zuofeng

2015-01-01

Image‐guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X‐rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post‐treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state‐of‐the‐art Web technologies, a domain model closely matching the workflow, a database‐supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model‐View‐Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client‐side technologies, such as jQuery, jQuery Plug‐ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process. PACS number: 87 PMID:26103504
A Proof of Concept to Bridge the Gap between Mass Spectrometry Imaging, Protein Identification and Relative Quantitation: MSI~LC-MS/MS-LF.

PubMed

Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe

2016-10-26

Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m / z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation.
A Proof of Concept to Bridge the Gap between Mass Spectrometry Imaging, Protein Identification and Relative Quantitation: MSI~LC-MS/MS-LF

PubMed Central

Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe

2016-01-01

Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m/z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation. PMID:28248242
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.

PubMed

Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G; Hardoim, Pablo R; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M T

2017-03-04

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane ( Saccharum spp.) and in maize ( Zea mays ). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants

PubMed Central

Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G.; Hardoim, Pablo R.; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M. T.

2017-01-01

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms. PMID:29657283
[Measures to prevent patient identification errors in blood collection/physiological function testing utilizing a laboratory information system].

PubMed

Shimazu, Chisato; Hoshino, Satoshi; Furukawa, Taiji

2013-08-01

We constructed an integrated personal identification workflow chart using both bar code reading and an all in-one laboratory information system. The information system not only handles test data but also the information needed for patient guidance in the laboratory department. The reception terminals at the entrance, displays for patient guidance and patient identification tools at blood-sampling booths are all controlled by the information system. The number of patient identification errors was greatly reduced by the system. However, identification errors have not been abolished in the ultrasound department. After re-evaluation of the patient identification process in this department, we recognized that the major reason for the errors came from excessive identification workflow. Ordinarily, an ultrasound test requires patient identification 3 times, because 3 different systems are required during the entire test process, i.e. ultrasound modality system, laboratory information system and a system for producing reports. We are trying to connect the 3 different systems to develop a one-time identification workflow, but it is not a simple task and has not been completed yet. Utilization of the laboratory information system is effective, but is not yet perfect for patient identification. The most fundamental procedure for patient identification is to ask a person's name even today. Everyday checks in the ordinary workflow and everyone's participation in safety-management activity are important for the prevention of patient identification errors.
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.

PubMed

Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert

2017-05-15

Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.
Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

PubMed Central

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

A plug-and-play pathway refactoring workflow for natural product research in Escherichia coli and Saccharomyces cerevisiae.

PubMed

Ren, Hengqian; Hu, Pingfan; Zhao, Huimin

2017-08-01

Pathway refactoring serves as an invaluable synthetic biology tool for natural product discovery, characterization, and engineering. However, the complicated and laborious molecular biology techniques largely hinder its application in natural product research, especially in a high-throughput manner. Here we report a plug-and-play pathway refactoring workflow for high-throughput, flexible pathway construction, and expression in both Escherichia coli and Saccharomyces cerevisiae. Biosynthetic genes were firstly cloned into pre-assembled helper plasmids with promoters and terminators, resulting in a series of expression cassettes. These expression cassettes were further assembled using Golden Gate reaction to generate fully refactored pathways. The inclusion of spacer plasmids in this system would not only increase the flexibility for refactoring pathways with different number of genes, but also facilitate gene deletion and replacement. As proof of concept, a total of 96 pathways for combinatorial carotenoid biosynthesis were built successfully. This workflow should be generally applicable to different classes of natural products produced by various organisms. Biotechnol. Bioeng. 2017;114: 1847-1854. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Metagenomics workflow analysis of endophytic bacteria from oil palm fruits

NASA Astrophysics Data System (ADS)

Tanjung, Z. A.; Aditama, R.; Sudania, W. M.; Utomo, C.; Liwang, T.

2017-05-01

Next-Generation Sequencing (NGS) has become a powerful sequencing tool for microbial study especially to lead the establishment of the field area of metagenomics. This study described a workflow to analyze metagenomics data of a Sequence Read Archive (SRA) file under accession ERP004286 deposited by University of Sao Paulo. It was a direct sequencing data generated by 454 pyrosequencing platform originated from oil palm fruits endophytic bacteria which were cultured using oil-palm enriched medium. This workflow used SortMeRNA to split ribosomal reads sequence, Newbler (GS Assembler and GS Mapper) to assemble and map reads into genome reference, BLAST package to identify and annotate contigs sequence, and QualiMap for statistical analysis. Eight bacterial species were identified in this study. Enterobacter cloacae was the most abundant species followed by Citrobacter koseri, Seratia marcescens, Latococcus lactis subsp. lactis, Klebsiella pneumoniae, Citrobacter amalonaticus, Achromobacter xylosoxidans, and Pseudomonas sp. respectively. All of these species have been reported as endophyte bacteria in various plant species and each has potential as plant growth promoting bacteria or another application in agricultural industries.
LipidFinder: A computational workflow for discovery of lipids identifies eicosanoid-phosphoinositides in platelets

PubMed Central

O’Connor, Anne; Brasher, Christopher J.; Slatter, David A.; Meckelmann, Sven W.; Hawksworth, Jade I.; Allen, Stuart M.; O’Donnell, Valerie B.

2017-01-01

Accurate and high-quality curation of lipidomic datasets generated from plasma, cells, or tissues is becoming essential for cell biology investigations and biomarker discovery for personalized medicine. However, a major challenge lies in removing artifacts otherwise mistakenly interpreted as real lipids from large mass spectrometry files (>60 K features), while retaining genuine ions in the dataset. This requires powerful informatics tools; however, available workflows have not been tailored specifically for lipidomics, particularly discovery research. We designed LipidFinder, an open-source Python workflow. An algorithm is included that optimizes analysis based on users’ own data, and outputs are screened against online databases and categorized into LIPID MAPS classes. LipidFinder outperformed three widely used metabolomics packages using data from human platelets. We show a family of three 12-hydroxyeicosatetraenoic acid phosphoinositides (16:0/, 18:1/, 18:0/12-HETE-PI) generated by thrombin-activated platelets, indicating crosstalk between eicosanoid and phosphoinositide pathways in human cells. The software is available on GitHub (https://github.com/cjbrasher/LipidFinder), with full userguides. PMID:28405621
Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

PubMed

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
Representing clinical guidelines in UMl: a comparative study.

PubMed

Hederman, Lucy; Smutek, Daniel; Wade, Vincent; Knape, Thomas

2002-01-01

Clinical guidelines can be represented using models, such as GLIF, specifically designed for healthcare guidelines. This paper demonstrates that they can also be modelled using a mainstream business modelling language such as UML. The paper presents a guideline in GLIF and as UML activity diagrams, and then presents a mapping of GLIF primitives to UML. The potential benefits of using a mainstream modelling language are outlined. These include availability of advanced modelling tools, transfer between modelling tools, and automation via business workflow technology.
Increasingly mobile: How new technologies can enhance qualitative research

PubMed Central

Moylan, Carrie Ann; Derr, Amelia Seraphia; Lindhorst, Taryn

2015-01-01

Advances in technology, such as the growth of smart phones, tablet computing, and improved access to the internet have resulted in many new tools and applications designed to increase efficiency and improve workflow. Some of these tools will assist scholars using qualitative methods with their research processes. We describe emerging technologies for use in data collection, analysis, and dissemination that each offer enhancements to existing research processes. Suggestions for keeping pace with the ever-evolving technological landscape are also offered. PMID:25798072
Enabling Efficient Climate Science Workflows in High Performance Computing Environments

NASA Astrophysics Data System (ADS)

Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

2015-12-01

A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.
ESC-Track: A computer workflow for 4-D segmentation, tracking, lineage tracing and dynamic context analysis of ESCs.

PubMed

Fernández-de-Manúel, Laura; Díaz-Díaz, Covadonga; Jiménez-Carretero, Daniel; Torres, Miguel; Montoya, María C

2017-05-01

Embryonic stem cells (ESCs) can be established as permanent cell lines, and their potential to differentiate into adult tissues has led to widespread use for studying the mechanisms and dynamics of stem cell differentiation and exploring strategies for tissue repair. Imaging live ESCs during development is now feasible due to advances in optical imaging and engineering of genetically encoded fluorescent reporters; however, a major limitation is the low spatio-temporal resolution of long-term 3-D imaging required for generational and neighboring reconstructions. Here, we present the ESC-Track (ESC-T) workflow, which includes an automated cell and nuclear segmentation and tracking tool for 4-D (3-D + time) confocal image data sets as well as a manual editing tool for visual inspection and error correction. ESC-T automatically identifies cell divisions and membrane contacts for lineage tree and neighborhood reconstruction and computes quantitative features from individual cell entities, enabling analysis of fluorescence signal dynamics and tracking of cell morphology and motion. We use ESC-T to examine Myc intensity fluctuations in the context of mouse ESC (mESC) lineage and neighborhood relationships. ESC-T is a powerful tool for evaluation of the genealogical and microenvironmental cues that maintain ESC fitness.
Towards PCC for Concurrent and Distributed Systems (Work in Progress)

NASA Technical Reports Server (NTRS)

Henriksen, Anders S.; Filinski, Andrzej

2009-01-01

We outline some conceptual challenges in extending the PCC paradigm to a concurrent and distributed setting, and sketch a generalized notion of module correctness based on viewing communication contracts as economic games. The model supports compositional reasoning about modular systems and is meant to apply not only to certification of executable code, but also of organizational workflows.
Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

PubMed

Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee

2018-01-01

Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Model-based Vestibular Afferent Stimulation: Modular Workflow for Analyzing Stimulation Scenarios in Patient Specific and Statistical Vestibular Anatomy.

PubMed

Handler, Michael; Schier, Peter P; Fritscher, Karl D; Raudaschl, Patrik; Johnson Chacko, Lejo; Glueckert, Rudolf; Saba, Rami; Schubert, Rainer; Baumgarten, Daniel; Baumgartner, Christian

2017-01-01

Our sense of balance and spatial orientation strongly depends on the correct functionality of our vestibular system. Vestibular dysfunction can lead to blurred vision and impaired balance and spatial orientation, causing a significant decrease in quality of life. Recent studies have shown that vestibular implants offer a possible treatment for patients with vestibular dysfunction. The close proximity of the vestibular nerve bundles, the facial nerve and the cochlear nerve poses a major challenge to targeted stimulation of the vestibular system. Modeling the electrical stimulation of the vestibular system allows for an efficient analysis of stimulation scenarios previous to time and cost intensive in vivo experiments. Current models are based on animal data or CAD models of human anatomy. In this work, a (semi-)automatic modular workflow is presented for the stepwise transformation of segmented vestibular anatomy data of human vestibular specimens to an electrical model and subsequently analyzed. The steps of this workflow include (i) the transformation of labeled datasets to a tetrahedra mesh, (ii) nerve fiber anisotropy and fiber computation as a basis for neuron models, (iii) inclusion of arbitrary electrode designs, (iv) simulation of quasistationary potential distributions, and (v) analysis of stimulus waveforms on the stimulation outcome. Results obtained by the workflow based on human datasets and the average shape of a statistical model revealed a high qualitative agreement and a quantitatively comparable range compared to data from literature, respectively. Based on our workflow, a detailed analysis of intra- and extra-labyrinthine electrode configurations with various stimulation waveforms and electrode designs can be performed on patient specific anatomy, making this framework a valuable tool for current optimization questions concerning vestibular implants in humans.
Effects of a cost-effective surgical workflow on cosmesis and patient's satisfaction in open thyroid surgery.

PubMed

Billmann, Franck; Bokor-Billmann, Therezia; Voigt, Joachim; Kiffner, Erhard

2013-01-01

In thyroid surgery, minimally invasive procedures are thought to improve cosmesis and patient's satisfaction. However, studies using standardized tools are scarce, and results are controversial. Moreover, minimally invasive techniques raise the question of material costs in a context of health spending cuts. The aim of the present study is to test a cost-effective surgical workflow to improve cosmesis in conventional open thyroid surgery. Our study ran between January 2009 and November 2010, and was based on a prospectively maintained thyroid surgery register. Patients operated for benign thyroid diseases were included. Since January 2010, a standardized surgical workflow was used in addition to the reference open procedure to improve the outcome. Two groups were created: (1) G1 group (patients operated with the reference technique), (2) G2 group (patients operated with our workflow in addition to reference technique). Patients were investigated for postoperative outcomes, self-evaluated body image, cosmetic and self-confidence scores. 820 patients were included in the present study. The overall body image and cosmetic scores were significantly better in the G2 group (P < 0.05). No significant difference was noted in terms of surgical outcomes, scar length, and self-confidence. Our surgical workflow in conjunction with the reference technique is safe and shows significant better results in terms of body image and cosmesis than do the reference technique alone. Thus, we recommend its implementation in order to improve outcomes in a cost-effective way. The limitations of the present study should be kept in mind in the elaboration of future studies. Copyright © 2012 Surgical Associates Ltd. Published by Elsevier Ltd. All rights reserved.
A formal approach to the analysis of clinical computer-interpretable guideline modeling languages.

PubMed

Grando, M Adela; Glasspool, David; Fox, John

2012-01-01

To develop proof strategies to formally study the expressiveness of workflow-based languages, and to investigate their applicability to clinical computer-interpretable guideline (CIG) modeling languages. We propose two strategies for studying the expressiveness of workflow-based languages based on a standard set of workflow patterns expressed as Petri nets (PNs) and notions of congruence and bisimilarity from process calculus. Proof that a PN-based pattern P can be expressed in a language L can be carried out semi-automatically. Proof that a language L cannot provide the behavior specified by a PNP requires proof by exhaustion based on analysis of cases and cannot be performed automatically. The proof strategies are generic but we exemplify their use with a particular CIG modeling language, PROforma. To illustrate the method we evaluate the expressiveness of PROforma against three standard workflow patterns and compare our results with a previous similar but informal comparison. We show that the two proof strategies are effective in evaluating a CIG modeling language against standard workflow patterns. We find that using the proposed formal techniques we obtain different results to a comparable previously published but less formal study. We discuss the utility of these analyses as the basis for principled extensions to CIG modeling languages. Additionally we explain how the same proof strategies can be reused to prove the satisfaction of patterns expressed in the declarative language CIGDec. The proof strategies we propose are useful tools for analysing the expressiveness of CIG modeling languages. This study provides good evidence of the benefits of applying formal methods of proof over semi-formal ones. Copyright © 2011 Elsevier B.V. All rights reserved.
Next-Generation Climate Modeling Science Challenges for Simulation, Workflow and Analysis Systems

NASA Astrophysics Data System (ADS)

Koch, D. M.; Anantharaj, V. G.; Bader, D. C.; Krishnan, H.; Leung, L. R.; Ringler, T.; Taylor, M.; Wehner, M. F.; Williams, D. N.

2016-12-01

We will present two examples of current and future high-resolution climate-modeling research that are challenging existing simulation run-time I/O, model-data movement, storage and publishing, and analysis. In each case, we will consider lessons learned as current workflow systems are broken by these large-data science challenges, as well as strategies to repair or rebuild the systems. First we consider the science and workflow challenges to be posed by the CMIP6 multi-model HighResMIP, involving around a dozen modeling groups performing quarter-degree simulations, in 3-member ensembles for 100 years, with high-frequency (1-6 hourly) diagnostics, which is expected to generate over 4PB of data. An example of science derived from these experiments will be to study how resolution affects the ability of models to capture extreme-events such as hurricanes or atmospheric rivers. Expected methods to transfer (using parallel Globus) and analyze (using parallel "TECA" software tools) HighResMIP data for such feature-tracking by the DOE CASCADE project will be presented. A second example will be from the Accelerated Climate Modeling for Energy (ACME) project, which is currently addressing challenges involving multiple century-scale coupled high resolution (quarter-degree) climate simulations on DOE Leadership Class computers. ACME is anticipating production of over 5PB of data during the next 2 years of simulations, in order to investigate the drivers of water cycle changes, sea-level-rise, and carbon cycle evolution. The ACME workflow, from simulation to data transfer, storage, analysis and publication will be presented. Current and planned methods to accelerate the workflow, including implementing run-time diagnostics, and implementing server-side analysis to avoid moving large datasets will be presented.
Important nonurgent imaging findings: use of a hybrid digital and administrative support tool for facilitating clinician communication.

PubMed

Johnson, Evan; Sanger, Joseph; Rosenkrantz, Andrew B

2015-01-01

A departmental tool that provides a digital/administrative solution for communication of important imaging findings was evaluated. The tool allows the radiologist to click a button to mark an examination for ordering physician follow-up with subsequent fax and confirmation. The tool's log was reviewed. Of 466 entries; 99.4% were successfully faxed with phone confirmation. Most common reasons for usage were lung nodule/mass (29.2%) and osseous fracture (12.4%). Subsequent clinical action was documented in 41.0% of entries. Our data show the reliability of the tool in assisting the communication of findings, as well as providing documentation of notification, with minimal workflow disruption. Copyright © 2015 Elsevier Inc. All rights reserved.
An overview of the biocreative 2012 workshop track III: Interactive text mining task

USDA-ARS?s Scientific Manuscript database

An important question is how to make use of text mining to enhance the biocuration workflow. A number of groups have developed tools for text mining from a computer science/linguistics perspective and there are many initiatives to curate some aspect of biology from the literature. In some cases the ...
Deducing Reaction Mechanism: A Guide for Students, Researchers, and Instructors

ERIC Educational Resources Information Center

Meek, Simon J.; Pitman, Catherine L.; Miller, Alexander J. M.

2016-01-01

An introductory guide to deducing the mechanism of chemical reactions is presented. Following a typical workflow for probing reaction mechanism, the guide introduces a wide range of kinetic and mechanistic tools. In addition to serving as a broad introduction to mechanistic analysis for students and researchers, the guide has also been used by…
Point Cloud-Based Automatic Assessment of 3D Computer Animation Courseworks

ERIC Educational Resources Information Center

Paravati, Gianluca; Lamberti, Fabrizio; Gatteschi, Valentina; Demartini, Claudio; Montuschi, Paolo

2017-01-01

Computer-supported assessment tools can bring significant benefits to both students and teachers. When integrated in traditional education workflows, they may help to reduce the time required to perform the evaluation and consolidate the perception of fairness of the overall process. When integrated within on-line intelligent tutoring systems,…
75 FR 70268 - Submission for OMB Review; Comment Request; NIH NCI Central Institutional Review Board (CIRB...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-17

... number. Proposed Collection: Title: NIH NCI Central Institutional Review Board (CIRB). Type of... is absent of conflicts of interest with the protocols under review. Tools utilized to accomplish this... information on workflow and processes of CIRB operations as well as a non-disclosure agreement. A conflict of...
Usability Testing and Workflow Analysis of the TRADOC Data Visualization Tool

DTIC Science & Technology

2012-09-01

software such as blink data, saccades, and cognitive load based on pupil contraction. Eye-tracking was only a component of the data evaluated and as...line charts were a pain to read) Yes Yes Projecting the charts directly onto the regions increased clutter on the screen and is a bad stylistic

A Virtual Environment for Process Management. A Step by Step Implementation

ERIC Educational Resources Information Center

Mayer, Sergio Valenzuela

2003-01-01

In this paper it is presented a virtual organizational environment, conceived with the integration of three computer programs: a manufacturing simulation package, an automation of businesses processes (workflows), and business intelligence (Balanced Scorecard) software. It was created as a supporting tool for teaching IE, its purpose is to give…
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
A zero-footprint 3D visualization system utilizing mobile display technology for timely evaluation of stroke patients

NASA Astrophysics Data System (ADS)

Park, Young Woo; Guo, Bing; Mogensen, Monique; Wang, Kevin; Law, Meng; Liu, Brent

2010-03-01

When a patient is accepted in the emergency room suspected of stroke, time is of the utmost importance. The infarct brain area suffers irreparable damage as soon as three hours after the onset of stroke symptoms. A CT scan is one of standard first line of investigations with imaging and is crucial to identify and properly triage stroke cases. The availability of an expert Radiologist in the emergency environment to diagnose the stroke patient in a timely manner only increases the challenges within the clinical workflow. Therefore, a truly zero-footprint web-based system with powerful advanced visualization tools for volumetric imaging including 2D. MIP/MPR, 3D display can greatly facilitate this dynamic clinical workflow for stroke patients. Together with mobile technology, the proper visualization tools can be delivered at the point of decision anywhere and anytime. We will present a small pilot project to evaluate the use of mobile technologies using devices such as iPhones in evaluating stroke patients. The results of the evaluation as well as any challenges in setting up the system will also be discussed.
Methodology to assess clinical liver safety data.

PubMed

Merz, Michael; Lee, Kwan R; Kullak-Ublick, Gerd A; Brueckner, Andreas; Watkins, Paul B

2014-11-01

Analysis of liver safety data has to be multivariate by nature and needs to take into account time dependency of observations. Current standard tools for liver safety assessment such as summary tables, individual data listings, and narratives address these requirements to a limited extent only. Using graphics in the context of a systematic workflow including predefined graph templates is a valuable addition to standard instruments, helping to ensure completeness of evaluation, and supporting both hypothesis generation and testing. Employing graphical workflows interactively allows analysis in a team-based setting and facilitates identification of the most suitable graphics for publishing and regulatory reporting. Another important tool is statistical outlier detection, accounting for the fact that for assessment of Drug-Induced Liver Injury, identification and thorough evaluation of extreme values has much more relevance than measures of central tendency in the data. Taken together, systematical graphical data exploration and statistical outlier detection may have the potential to significantly improve assessment and interpretation of clinical liver safety data. A workshop was convened to discuss best practices for the assessment of drug-induced liver injury (DILI) in clinical trials.
Binding-Site Compatible Fragment Growing Applied to the Design of β2-Adrenergic Receptor Ligands.

PubMed

Chevillard, Florent; Rimmer, Helena; Betti, Cecilia; Pardon, Els; Ballet, Steven; van Hilten, Niek; Steyaert, Jan; Diederich, Wibke E; Kolb, Peter

2018-02-08

Fragment-based drug discovery is intimately linked to fragment extension approaches that can be accelerated using software for de novo design. Although computers allow for the facile generation of millions of suggestions, synthetic feasibility is however often neglected. In this study we computationally extended, chemically synthesized, and experimentally assayed new ligands for the β 2 -adrenergic receptor (β 2 AR) by growing fragment-sized ligands. In order to address the synthetic tractability issue, our in silico workflow aims at derivatized products based on robust organic reactions. The study started from the predicted binding modes of five fragments. We suggested a total of eight diverse extensions that were easily synthesized, and further assays showed that four products had an improved affinity (up to 40-fold) compared to their respective initial fragment. The described workflow, which we call "growing via merging" and for which the key tools are available online, can improve early fragment-based drug discovery projects, making it a useful creative tool for medicinal chemists during structure-activity relationship (SAR) studies.
Using the iPlant collaborative discovery environment.

PubMed

Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J

2013-06-01

The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

PubMed Central

Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.

2017-01-01

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609
VisRseq: R-based visual framework for analysis of sequencing data

PubMed Central

2015-01-01

Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. Conclusions To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights. PMID:26328469
VisRseq: R-based visual framework for analysis of sequencing data.

PubMed

Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

2015-01-01

Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE PAGES

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.; ...

2016-11-24

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
π Scope: python based scientific workbench with visualization tool for MDSplus data

NASA Astrophysics Data System (ADS)

Shiraiwa, S.

2014-10-01

π Scope is a python based scientific data analysis and visualization tool constructed on wxPython and Matplotlib. Although it is designed to be a generic tool, the primary motivation for developing the new software is 1) to provide an updated tool to browse MDSplus data, with functionalities beyond dwscope and jScope, and 2) to provide a universal foundation to construct interface tools to perform computer simulation and modeling for Alcator C-Mod. It provides many features to visualize MDSplus data during tokamak experiments including overplotting different signals and discharges, various plot types (line, contour, image, etc.), in-panel data analysis using python scripts, and publication quality graphics generation. Additionally, the logic to produce multi-panel plots is designed to be backward compatible with dwscope, enabling smooth migration for dwscope users. πScope uses multi-threading to reduce data transfer latency, and its object-oriented design makes it easy to modify and expand while the open source nature allows portability. A built-in tree data browser allows a user to approach the data structure both from a GUI and a script, enabling relatively complex data analysis workflow to be built quickly. As an example, an IDL-based interface to perform GENRAY/CQL3D simulations was ported on πScope, thus allowing LHCD simulation to be run between-shot using C-Mod experimental profiles. This workflow is being used to generate a large database to develop a LHCD actuator model for the plasma control system. Supported by USDoE Award DE-FC02-99ER54512.
Cloud-Based Tools to Support High-Resolution Modeling (Invited)

NASA Astrophysics Data System (ADS)

Jones, N.; Nelson, J.; Swain, N.; Christensen, S.

2013-12-01

The majority of watershed models developed to support decision-making by water management agencies are simple, lumped-parameter models. Maturity in research codes and advances in the computational power from multi-core processors on desktop machines, commercial cloud-computing resources, and supercomputers with thousands of cores have created new opportunities for employing more accurate, high-resolution distributed models for routine use in decision support. The barriers for using such models on a more routine basis include massive amounts of spatial data that must be processed for each new scenario and lack of efficient visualization tools. In this presentation we will review a current NSF-funded project called CI-WATER that is intended to overcome many of these roadblocks associated with high-resolution modeling. We are developing a suite of tools that will make it possible to deploy customized web-based apps for running custom scenarios for high-resolution models with minimal effort. These tools are based on a software stack that includes 52 North, MapServer, PostGIS, HT Condor, CKAN, and Python. This open source stack provides a simple scripting environment for quickly configuring new custom applications for running high-resolution models as geoprocessing workflows. The HT Condor component facilitates simple access to local distributed computers or commercial cloud resources when necessary for stochastic simulations. The CKAN framework provides a powerful suite of tools for hosting such workflows in a web-based environment that includes visualization tools and storage of model simulations in a database to archival, querying, and sharing of model results. Prototype applications including land use change, snow melt, and burned area analysis will be presented. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

PubMed

Dunn, Joshua G; Weissman, Jonathan S

2016-11-22

Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io .
Reproducible research in palaeomagnetism

NASA Astrophysics Data System (ADS)

Lurcock, Pontus; Florindo, Fabio

2015-04-01

The reproducibility of research findings is attracting increasing attention across all scientific disciplines. In palaeomagnetism as elsewhere, computer-based analysis techniques are becoming more commonplace, complex, and diverse. Analyses can often be difficult to reproduce from scratch, both for the original researchers and for others seeking to build on the work. We present a palaeomagnetic plotting and analysis program designed to make reproducibility easier. Part of the problem is the divide between interactive and scripted (batch) analysis programs. An interactive desktop program with a graphical interface is a powerful tool for exploring data and iteratively refining analyses, but usually cannot operate without human interaction. This makes it impossible to re-run an analysis automatically, or to integrate it into a larger automated scientific workflow - for example, a script to generate figures and tables for a paper. In some cases the parameters of the analysis process itself are not saved explicitly, making it hard to repeat or improve the analysis even with human interaction. Conversely, non-interactive batch tools can be controlled by pre-written scripts and configuration files, allowing an analysis to be 'replayed' automatically from the raw data. However, this advantage comes at the expense of exploratory capability: iteratively improving an analysis entails a time-consuming cycle of editing scripts, running them, and viewing the output. Batch tools also tend to require more computer expertise from their users. PuffinPlot is a palaeomagnetic plotting and analysis program which aims to bridge this gap. First released in 2012, it offers both an interactive, user-friendly desktop interface and a batch scripting interface, both making use of the same core library of palaeomagnetic functions. We present new improvements to the program that help to integrate the interactive and batch approaches, allowing an analysis to be interactively explored and refined, then saved as a self-contained configuration which can be re-run without human interaction. PuffinPlot can thus be used as a component of a larger scientific workflow, integrated with workflow management tools such as Kepler, without compromising its capabilities as an exploratory tool. Since both PuffinPlot and the platform it runs on (Java) are Free/Open Source software, even the most fundamental components of an analysis can be verified and reproduced.
Designing Collaborative Developmental Standards by Refactoring of the Earth Science Models, Libraries, Workflows and Frameworks.

NASA Astrophysics Data System (ADS)

Mirvis, E.; Iredell, M.

2015-12-01

The operational (OPS) NOAA National Centers for Environmental Prediction (NCEP) suite, traditionally, consist of a large set of multi- scale HPC models, workflows, scripts, tools and utilities, which are very much depending on the variety of the additional components. Namely, this suite utilizes a unique collection of the in-house developed 20+ shared libraries (NCEPLIBS), certain versions of the 3-rd party libraries (like netcdf, HDF, ESMF, jasper, xml etc.), HPC workflow tool within dedicated (sometimes even vendors' customized) HPC system homogeneous environment. This domain and site specific, accompanied with NCEP's product- driven large scale real-time data operations complicates NCEP collaborative development tremendously by reducing chances to replicate this OPS environment anywhere else. The NOAA/NCEP's Environmental Modeling Center (EMC) missions to develop and improve numerical weather, climate, hydrological and ocean prediction through the partnership with the research community. Realizing said difficulties, lately, EMC has been taken an innovative approach to improve flexibility of the HPC environment by building the elements and a foundation for NCEP OPS functionally equivalent environment (FEE), which can be used to ease the external interface constructs as well. Aiming to reduce turnaround time of the community code enhancements via Research-to-Operations (R2O) cycle, EMC developed and deployed several project sub-set standards that already paved the road to NCEP OPS implementation standards. In this topic we will discuss the EMC FEE for O2R requirements and approaches in collaborative standardization, including NCEPLIBS FEE and models code version control paired with the models' derived customized HPC modules and FEE footprints. We will share NCEP/EMC experience and potential in the refactoring of EMC development processes, legacy codes and in securing model source code quality standards by using combination of the Eclipse IDE, integrated with the reverse engineering tools/APIs. We will also inform on collaborative efforts in the restructuring of the NOAA Environmental Modeling System (NEMS) - the multi- model and coupling framework, and transitioning FEE verification methodology.
ClimatePipes: User-Friendly Data Access, Manipulation, Analysis & Visualization of Community Climate Models

NASA Astrophysics Data System (ADS)

Chaudhary, A.; DeMarle, D.; Burnett, B.; Harris, C.; Silva, W.; Osmari, D.; Geveci, B.; Silva, C.; Doutriaux, C.; Williams, D. N.

2013-12-01

The impact of climate change will resonate through a broad range of fields including public health, infrastructure, water resources, and many others. Long-term coordinated planning, funding, and action are required for climate change adaptation and mitigation. Unfortunately, widespread use of climate data (simulated and observed) in non-climate science communities is impeded by factors such as large data size, lack of adequate metadata, poor documentation, and lack of sufficient computational and visualization resources. We present ClimatePipes to address many of these challenges by creating an open source platform that provides state-of-the-art, user-friendly data access, analysis, and visualization for climate and other relevant geospatial datasets, making the climate data available to non-researchers, decision-makers, and other stakeholders. The overarching goals of ClimatePipes are: - Enable users to explore real-world questions related to climate change. - Provide tools for data access, analysis, and visualization. - Facilitate collaboration by enabling users to share datasets, workflows, and visualization. ClimatePipes uses a web-based application platform for its widespread support on mainstream operating systems, ease-of-use, and inherent collaboration support. The front-end of ClimatePipes uses HTML5 (WebGL, Canvas2D, CSS3) to deliver state-of-the-art visualization and to provide a best-in-class user experience. The back-end of the ClimatePipes is built around Python using the Visualization Toolkit (VTK, http://vtk.org), Climate Data Analysis Tools (CDAT, http://uv-cdat.llnl.gov), and other climate and geospatial data processing tools such as GDAL and PROJ4. ClimatePipes web-interface to query and access data from remote sources (such as ESGF). Shown in the figure is climate data layer from ESGF on top of map data layer from OpenStreetMap. The ClimatePipes workflow editor provides flexibility and fine grained control, and uses the VisTrails (http://www.vistrails.org) workflow engine in the backend.
WE-H-BRC-04: Implement Lean Methodology to Make Our Current Process of CT Simulation to Treatment More Efficient

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boddu, S; Morrow, A; Krishnamurthy, N

Purpose: Our goal is to implement lean methodology to make our current process of CT simulation to treatment more efficient. Methods: In this study, we implemented lean methodology and tools and employed flowchart in excel for process-mapping. We formed a group of physicians, physicists, dosimetrists, therapists and a clinical physics assistant and huddled bi-weekly to map current value streams. We performed GEMBA walks and observed current processes from scheduling patient CT Simulations to treatment plan approval. From this, the entire workflow was categorized into processes, sub-processes, and tasks. For each process we gathered data on touch time, first time quality,more » undesirable effects (UDEs), and wait-times from relevant members of each task. UDEs were binned per frequency of their occurrence. We huddled to map future state and to find solutions to high frequency UDEs. We implemented visual controls, hard stops, and documented issues found during chart checks prior to treatment plan approval. Results: We have identified approximately 64 UDEs in our current workflow that could cause delays, re-work, compromise the quality and safety of patient treatments, or cause wait times between 1 – 6 days. While some UDEs are unavoidable, such as re-planning due to patient weight loss, eliminating avoidable UDEs is our goal. In 2015, we found 399 issues with patient treatment plans, of which 261, 95 and 43 were low, medium and high severity, respectively. We also mapped patient-specific QA processes for IMRT/Rapid Arc and SRS/SBRT, involving 10 and 18 steps, respectively. From these, 13 UDEs were found and 5 were addressed that solved 20% of issues. Conclusion: We have successfully implemented lean methodology and tools. We are further mapping treatment site specific workflows to identify bottlenecks, potential breakdowns and personnel allocation and employ tools like failure mode effects analysis to mitigate risk factors to make this process efficient.« less
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

PubMed

Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

2016-01-01

In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
SWS: accessing SRS sites contents through Web Services.

PubMed

Romano, Paolo; Marra, Domenico

2008-03-26

Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

PubMed

Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana

2015-06-30

Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.