Booth, Richard G; Sinclair, Barbara; Strudwick, Gillian; Brennan, Laura; Morgan, Lisa; Collings, Stephanie; Johnston, Jessica; Loggie, Brittany; Tong, James; Singh, Chantal
The purpose of this quality improvement project was to better understand how to teach medication administration underpinned by an electronic medication administration record (eMAR) system used in simulated, prelicensure nursing education. Methods included a workflow and integration analysis and a detailed process mapping of both an oral and a sublingual medication administration. Procedural and curriculum development considerations related to medication administration using eMAR technology are presented for nurse educators.
Teaching Workflow Analysis and Lean Thinking via Simulation: A Formative Evaluation
Campbell, Robert James; Gantt, Laura; Congdon, Tamara
2009-01-01
This article presents the rationale for the design and development of a video simulation used to teach lean thinking and workflow analysis to health services and health information management students enrolled in a course on the management of health information. The discussion includes a description of the design process, a brief history of the use of simulation in healthcare, and an explanation of how video simulation can be used to generate experiential learning environments. Based on the results of a survey given to 75 students as part of a formative evaluation, the video simulation was judged effective because it allowed students to visualize a real-world process (concrete experience), contemplate the scenes depicted in the video along with the concepts presented in class in a risk-free environment (reflection), develop hypotheses about why problems occurred in the workflow process (abstract conceptualization), and develop solutions to redesign a selected process (active experimentation). PMID:19412533
ERIC Educational Resources Information Center
King, Melanie; Loddington, Steve; Manuel, Sue; Oppenheim, Charles
2008-01-01
The last couple of years have brought a rise in the number of institutional repositories throughout the world and within UK Higher Education institutions, with the majority of these repositories being devoted to research output. Repositories containing teaching and learning material are less common and the workflows and business processes…
Critical care physician cognitive task analysis: an exploratory study
Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter
2009-01-01
Introduction For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. Methods After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Results Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. Conclusions CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes. PMID:19265517
Critical care physician cognitive task analysis: an exploratory study.
Fackler, James C; Watts, Charles; Grome, Anna; Miller, Thomas; Crandall, Beth; Pronovost, Peter
2009-01-01
For better or worse, the imposition of work-hour limitations on house-staff has imperiled continuity and/or improved decision-making. Regardless, the workflow of every physician team in every academic medical centre has been irrevocably altered. We explored the use of cognitive task analysis (CTA) techniques, most commonly used in other high-stress and time-sensitive environments, to analyse key cognitive activities in critical care medicine. The study objective was to assess the usefulness of CTA as an analytical tool in order that physician cognitive tasks may be understood and redistributed within the work-hour limited medical decision-making teams. After approval from each Institutional Review Board, two intensive care units (ICUs) within major university teaching hospitals served as data collection sites for CTA observations and interviews of critical care providers. Five broad categories of cognitive activities were identified: pattern recognition; uncertainty management; strategic vs. tactical thinking; team coordination and maintenance of common ground; and creation and transfer of meaning through stories. CTA within the framework of Naturalistic Decision Making is a useful tool to understand the critical care process of decision-making and communication. The separation of strategic and tactical thinking has implications for workflow redesign. Given the global push for work-hour limitations, such workflow redesign is occurring. Further work with CTA techniques will provide important insights toward rational, rather than random, workflow changes.
Teach-Discover-Treat (TDT): Collaborative Computational Drug Discovery for Neglected Diseases
Jansen, Johanna M.; Cornell, Wendy; Tseng, Y. Jane; Amaro, Rommie E.
2012-01-01
Teach – Discover – Treat (TDT) is an initiative to promote the development and sharing of computational tools solicited through a competition with the aim to impact education and collaborative drug discovery for neglected diseases. Collaboration, multidisciplinary integration, and innovation are essential for successful drug discovery. This requires a workforce that is trained in state-of-the-art workflows and equipped with the ability to collaborate on platforms that are accessible and free. The TDT competition solicits high quality computational workflows for neglected disease targets, using freely available, open access tools. PMID:23085175
DOE Office of Scientific and Technical Information (OSTI.GOV)
Copps, Kevin D.
The Sandia Analysis Workbench (SAW) project has developed and deployed a production capability for SIERRA computational mechanics analysis workflows. However, the electrical analysis workflow capability requirements have only been demonstrated in early prototype states, with no real capability deployed for analysts’ use. This milestone aims to improve the electrical analysis workflow capability (via SAW and related tools) and deploy it for ongoing use. We propose to focus on a QASPR electrical analysis calibration workflow use case. We will include a number of new capabilities (versus today’s SAW), such as: 1) support for the XYCE code workflow component, 2) data managementmore » coupled to electrical workflow, 3) human-in-theloop workflow capability, and 4) electrical analysis workflow capability deployed on the restricted (and possibly classified) network at Sandia. While far from the complete set of capabilities required for electrical analysis workflow over the long term, this is a substantial first step toward full production support for the electrical analysts.« less
Automation of Educational Tasks for Academic Radiology.
Lamar, David L; Richardson, Michael L; Carlson, Blake
2016-07-01
The process of education involves a variety of repetitious tasks. We believe that appropriate computer tools can automate many of these chores, and allow both educators and their students to devote a lot more of their time to actual teaching and learning. This paper details tools that we have used to automate a broad range of academic radiology-specific tasks on Mac OS X, iOS, and Windows platforms. Some of the tools we describe here require little expertise or time to use; others require some basic knowledge of computer programming. We used TextExpander (Mac, iOS) and AutoHotKey (Win) for automated generation of text files, such as resident performance reviews and radiology interpretations. Custom statistical calculations were performed using TextExpander and the Python programming language. A workflow for automated note-taking was developed using Evernote (Mac, iOS, Win) and Hazel (Mac). Automated resident procedure logging was accomplished using Editorial (iOS) and Python. We created three variants of a teaching session logger using Drafts (iOS) and Pythonista (iOS). Editorial and Drafts were used to create flashcards for knowledge review. We developed a mobile reference management system for iOS using Editorial. We used the Workflow app (iOS) to automatically generate a text message reminder for daily conferences. Finally, we developed two separate automated workflows-one with Evernote (Mac, iOS, Win) and one with Python (Mac, Win)-that generate simple automated teaching file collections. We have beta-tested these workflows, techniques, and scripts on several of our fellow radiologists. All of them expressed enthusiasm for these tools and were able to use one or more of them to automate their own educational activities. Appropriate computer tools can automate many educational tasks, and thereby allow both educators and their students to devote a lot more of their time to actual teaching and learning. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Space: the final frontier in the learning of science?
NASA Astrophysics Data System (ADS)
Milne, Catherine
2014-03-01
In Space, relations, and the learning of science, Wolff-Michael Roth and Pei-Ling Hsu use ethnomethodology to explore high school interns learning shopwork and shoptalk in a research lab that is located in a world class facility for water quality analysis. Using interaction analysis they identify how spaces, like a research laboratory, can be structured as smart spaces to create a workflow (learning flow) so that shoptalk and shopwork can projectively organize the actions of interns even in new and unfamiliar settings. Using these findings they explore implications for the design of curriculum and learning spaces more broadly. The Forum papers of Erica Blatt and Cassie Quigley complement this analysis. Blatt expands the discussion on space as an active component of learning with an examination of teaching settings, beyond laboratory spaces, as active participants of education. Quigley examines smart spaces as authentic learning spaces while acknowledging how internship experiences all empirical elements of authentic learning including open-ended inquiry and empowerment. In this paper I synthesize these ideas and propose that a narrative structure might better support workflow, student agency and democratic decision making.
Teaching Information Security with Workflow Technology--A Case Study Approach
ERIC Educational Resources Information Center
He, Wu; Kshirsagar, Ashish; Nwala, Alexander; Li, Yaohang
2014-01-01
In recent years, there has been a significant increase in the demand from professionals in different areas for improving the curricula regarding information security. The use of authentic case studies in teaching information security offers the potential to effectively engage students in active learning. In this paper, the authors introduce the…
Ameisen, David; Deroulers, Christophe; Perrier, Valérie; Bouhidel, Fatiha; Battistella, Maxime; Legrès, Luc; Janin, Anne; Bertheau, Philippe; Yunès, Jean-Baptiste
2014-01-01
Since microscopic slides can now be automatically digitized and integrated in the clinical workflow, quality assessment of Whole Slide Images (WSI) has become a crucial issue. We present a no-reference quality assessment method that has been thoroughly tested since 2010 and is under implementation in multiple sites, both public university-hospitals and private entities. It is part of the FlexMIm R&D project which aims to improve the global workflow of digital pathology. For these uses, we have developed two programming libraries, in Java and Python, which can be integrated in various types of WSI acquisition systems, viewers and image analysis tools. Development and testing have been carried out on a MacBook Pro i7 and on a bi-Xeon 2.7GHz server. Libraries implementing the blur assessment method have been developed in Java, Python, PHP5 and MySQL5. For web applications, JavaScript, Ajax, JSON and Sockets were also used, as well as the Google Maps API. Aperio SVS files were converted into the Google Maps format using VIPS and Openslide libraries. We designed the Java library as a Service Provider Interface (SPI), extendable by third parties. Analysis is computed in real-time (3 billion pixels per minute). Tests were made on 5000 single images, 200 NDPI WSI, 100 Aperio SVS WSI converted to the Google Maps format. Applications based on our method and libraries can be used upstream, as calibration and quality control tool for the WSI acquisition systems, or as tools to reacquire tiles while the WSI is being scanned. They can also be used downstream to reacquire the complete slides that are below the quality threshold for surgical pathology analysis. WSI may also be displayed in a smarter way by sending and displaying the regions of highest quality before other regions. Such quality assessment scores could be integrated as WSI's metadata shared in clinical, research or teaching contexts, for a more efficient medical informatics workflow.
Teaching with Stereoscopic Video: Opportunities and Challenges
NASA Astrophysics Data System (ADS)
Variano, Evan
2017-11-01
I will present my work on creating stereoscopic videos for fluid pedagogy. I discuss a variety of workflows for content creation and a variety of platforms for content delivery. I review the qualitative lessons learned when teaching with this material, and discuss outlook for the future. This work was partially supported by the NSF award ENG-1604026 and the UC Berkeley Student Technology Fund.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.
2011-07-04
A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less
Wu, Danny T Y; Smart, Nikolas; Ciemins, Elizabeth L; Lanham, Holly J; Lindberg, Curt; Zheng, Kai
2017-01-01
To develop a workflow-supported clinical documentation system, it is a critical first step to understand clinical workflow. While Time and Motion studies has been regarded as the gold standard of workflow analysis, this method can be resource consuming and its data may be biased due to the cognitive limitation of human observers. In this study, we aimed to evaluate the feasibility and validity of using EHR audit trail logs to analyze clinical workflow. Specifically, we compared three known workflow changes from our previous study with the corresponding EHR audit trail logs of the study participants. The results showed that EHR audit trail logs can be a valid source for clinical workflow analysis, and can provide an objective view of clinicians' behaviors, multi-dimensional comparisons, and a highly extensible analysis framework.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.
Jimenez, Rafael C; Corpas, Manuel
2013-01-01
Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
NASA Astrophysics Data System (ADS)
Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.
2015-12-01
Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical models.
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.
Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A
2017-12-01
Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN
Merchant, Nirav
2016-01-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.
Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P
2016-04-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
2014-01-01
Background Since microscopic slides can now be automatically digitized and integrated in the clinical workflow, quality assessment of Whole Slide Images (WSI) has become a crucial issue. We present a no-reference quality assessment method that has been thoroughly tested since 2010 and is under implementation in multiple sites, both public university-hospitals and private entities. It is part of the FlexMIm R&D project which aims to improve the global workflow of digital pathology. For these uses, we have developed two programming libraries, in Java and Python, which can be integrated in various types of WSI acquisition systems, viewers and image analysis tools. Methods Development and testing have been carried out on a MacBook Pro i7 and on a bi-Xeon 2.7GHz server. Libraries implementing the blur assessment method have been developed in Java, Python, PHP5 and MySQL5. For web applications, JavaScript, Ajax, JSON and Sockets were also used, as well as the Google Maps API. Aperio SVS files were converted into the Google Maps format using VIPS and Openslide libraries. Results We designed the Java library as a Service Provider Interface (SPI), extendable by third parties. Analysis is computed in real-time (3 billion pixels per minute). Tests were made on 5000 single images, 200 NDPI WSI, 100 Aperio SVS WSI converted to the Google Maps format. Conclusions Applications based on our method and libraries can be used upstream, as calibration and quality control tool for the WSI acquisition systems, or as tools to reacquire tiles while the WSI is being scanned. They can also be used downstream to reacquire the complete slides that are below the quality threshold for surgical pathology analysis. WSI may also be displayed in a smarter way by sending and displaying the regions of highest quality before other regions. Such quality assessment scores could be integrated as WSI's metadata shared in clinical, research or teaching contexts, for a more efficient medical informatics workflow. PMID:25565494
Biowep: a workflow enactment portal for bioinformatics applications.
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-03-08
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-01-01
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.
Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa
2012-05-04
Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-05-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.
Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A
2005-04-07
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Fang, Daniel Z; Patil, Teja; Belitskaya-Levy, Ilana; Yeung, Marianne; Posley, Keith; Allaudeen, Nazima
2017-11-17
Efficient and effective communication between providers is critical to quality patient care within a hospital system. Hands free communication devices (HFCD) allow instantaneous, closed-loop communication between physicians and other members of a multidisciplinary team, providing a communication advantage over traditional pager systems. HFCD have been shown to decrease emergency room interruptions, improve nursing communication, improve speed of information flow, and eliminate health care waste. We evaluated the integration of an HFCD with an existing alphanumeric paging system on an acute inpatient medicine service. We conducted a prospective, observational, survey-based study over twenty-four weeks in an academic tertiary care center with attending physicians and residents. Our intervention involved the implementation of an HFCD alongside the existing paging system. Fifty-six pre and post surveys evaluated the perception of improvement in communication and the integration of the HFCD into existing workflow. We saw significant improvements in the ability of an HFCD to help physicians communicate thoughts clearly, communicate thoughts effectively, reach team members, reach ancillary staff, and stay informed about patients. Physicians also reported better workflow integration during admissions, rounds, discharge, and teaching sessions. Qualitative data from post surveys demonstrated that the greatest strengths of the HFCD included the ability to reach colleagues and staff quickly, provide instant access to individuals of the care team, and improve overall communication. Integration of an instantaneous, hands free, closed loop communication system alongside the existing pager system can provide improvements in the perceptions of communication and workflow integration in an academic medicine service. Future studies are needed to correlate these subjective findings with objective measures of quality and safety.
Modelling and analysis of workflow for lean supply chains
NASA Astrophysics Data System (ADS)
Ma, Jinping; Wang, Kanliang; Xu, Lida
2011-11-01
Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.
Workflows for microarray data processing in the Kepler environment.
Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark
2012-05-17
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-12-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using “service casts” and “interest casts” (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH’s Mining Workflow Composer and the open-source Active BPEL engine, and JPL’s SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the “sociological” problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
2016-10-06
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support
2012-01-01
Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org. PMID:22559942
Using Kepler for Tool Integration in Microarray Analysis Workflows.
Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C
Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-11-28
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-03-01
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
AtomPy: an open atomic-data curation environment
NASA Astrophysics Data System (ADS)
Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka
2014-06-01
We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.
Standardizing clinical trials workflow representation in UML for international site comparison.
de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M O; Rodrigues, Maria J; Shah, Jatin; Loures, Marco R; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo
2010-11-09
With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows.
Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison
de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo
2010-01-01
Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows. PMID:21085484
NASA Astrophysics Data System (ADS)
Peer, Regina; Peer, Siegfried; Sander, Heike; Marsolek, Ingo; Koller, Wolfgang; Pappert, Dirk; Hierholzer, Johannes
2002-05-01
If new technology is introduced into medical practice it must prove to make a difference. However traditional approaches of outcome analysis failed to show a direct benefit of PACS on patient care and economical benefits are still in debate. A participatory process analysis was performed to compare workflow in a film based hospital and a PACS environment. This included direct observation of work processes, interview of involved staff, structural analysis and discussion of observations with staff members. After definition of common structures strong and weak workflow steps were evaluated. With a common workflow structure in both hospitals, benefits of PACS were revealed in workflow steps related to image reporting with simultaneous image access for ICU-physicians and radiologists, archiving of images as well as image and report distribution. However PACS alone is not able to cover the complete process of 'radiography for intensive care' from ordering of an image till provision of the final product equals image + report. Interference of electronic workflow with analogue process steps such as paper based ordering reduces the potential benefits of PACS. In this regard workflow modeling proved to be very helpful for the evaluation of complex work processes linking radiology and the ICU.
Exploring Two Approaches for an End-to-End Scientific Analysis Workflow
NASA Astrophysics Data System (ADS)
Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; Paterno, Marc; Sehrish, Saba
2015-12-01
The scientific discovery process can be advanced by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally, it is important for scientists to be able to share their workflows with collaborators. We have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC); the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In this paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.
Agile parallel bioinformatics workflow management using Pwrake.
Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro
2011-09-08
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake
2011-01-01
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
Conceptual-level workflow modeling of scientific experiments using NMR as a case study
Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R
2007-01-01
Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment. PMID:17263870
Conceptual-level workflow modeling of scientific experiments using NMR as a case study.
Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R
2007-01-30
Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
Liebe, J D; Hübner, U; Straede, M C; Thye, J
2015-01-01
Availability and usage of individual IT applications have been studied intensively in the past years. Recently, IT support of clinical processes is attaining increasing attention. The underlying construct that describes the IT support of clinical workflows is clinical information logistics. This construct needs to be better understood, operationalised and measured. It is therefore the aim of this study to propose and develop a workflow composite score (WCS) for measuring clinical information logistics and to examine its quality based on reliability and validity analyses. We largely followed the procedural model of MacKenzie and colleagues (2011) for defining and conceptualising the construct domain, for developing the measurement instrument, assessing the content validity, pretesting the instrument, specifying the model, capturing the data and computing the WCS and testing the reliability and validity. Clinical information logistics was decomposed into the descriptors data and information, function, integration and distribution, which embraced the framework validated by an analysis of the international literature. This framework was refined selecting representative clinical processes. We chose ward rounds, pre- and post-surgery processes and discharge as sample processes that served as concrete instances for the measurements. They are sufficiently complex, represent core clinical processes and involve different professions, departments and settings. The score was computed on the basis of data from 183 hospitals of different size, ownership, location and teaching status. Testing the reliability and validity yielded encouraging results: the reliability was high with r(split-half) = 0.89, the WCS discriminated between groups; the WCS correlated significantly and moderately with two EHR models and the WCS received good evaluation results by a sample of chief information officers (n = 67). These findings suggest the further utilisation of the WCS. As the WCS does not assume ideal workflows as a gold standard but measures IT support of clinical workflows according to validated descriptors a high portability of the WCS to other hospitals in other countries is very likely. The WCS will contribute to a better understanding of the construct clinical information logistics.
Resident training in a teaching hospital: How do attendings teach in the real operative environment?
Glarner, Carly E; Law, Katherine E; Zelenski, Amy B; McDonald, Robert J; Greenberg, Jacob A; Foley, Eugene F; Wiegmann, Douglas A; Greenberg, Caprice C
2017-07-01
The study aim was to explore the nature of intraoperative education and its interaction with the environment where surgical education occurs. Video and audio recording captured teaching interactions between colorectal surgeons and general surgery residents during laparoscopic segmental colectomies. Cases and collected data were analyzed for teaching behaviors and workflow disruptions. Flow disruptions (FDs) are considered deviations from natural case progression. Across 10 cases (20.4 operative hours), attendings spent 11.2 hours (54.7%) teaching, using directing (M = 250.1), and confirming (M = 236.1) most. FDs occurred 410 times, accounting for 4.4 hours of case time (21.57%). Teaching occurred with FD events for 2.4 hours (22.2%), whereas 77.8% of teaching happened outside FD occurrence. Teaching methods shifted from active to passive during FD events to compensate for patient safety. Understanding how FDs impact operative learning will inform faculty development in managing interruptions and improve its integration into resident education. Copyright © 2016. Published by Elsevier Inc.
Improving adherence to the Epic Beacon ambulatory workflow.
Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana
2017-06-01
Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.
An Integrated Framework for Parameter-based Optimization of Scientific Workflows.
Kumar, Vijay S; Sadayappan, P; Mehta, Gaurang; Vahi, Karan; Deelman, Ewa; Ratnakar, Varun; Kim, Jihie; Gil, Yolanda; Hall, Mary; Kurc, Tahsin; Saltz, Joel
2009-01-01
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.
Exploring Two Approaches for an End-to-End Scientific Analysis Workflow
Dodelson, Scott; Kent, Steve; Kowalkowski, Jim; ...
2015-12-23
The advance of the scientific discovery process is accomplished by the integration of independently-developed programs run on disparate computing facilities into coherent workflows usable by scientists who are not experts in computing. For such advancement, we need a system which scientists can use to formulate analysis workflows, to integrate new components to these workflows, and to execute different components on resources that are best suited to run those components. In addition, we need to monitor the status of the workflow as components get scheduled and executed, and to access the intermediate and final output for visual exploration and analysis. Finally,more » it is important for scientists to be able to share their workflows with collaborators. Moreover we have explored two approaches for such an analysis framework for the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration (DESC), the first one is based on the use and extension of Galaxy, a web-based portal for biomedical research, and the second one is based on a programming language, Python. In our paper, we present a brief description of the two approaches, describe the kinds of extensions to the Galaxy system we have found necessary in order to support the wide variety of scientific analysis in the cosmology community, and discuss how similar efforts might be of benefit to the HEP community.« less
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
A Virtual Environment for Process Management. A Step by Step Implementation
ERIC Educational Resources Information Center
Mayer, Sergio Valenzuela
2003-01-01
In this paper it is presented a virtual organizational environment, conceived with the integration of three computer programs: a manufacturing simulation package, an automation of businesses processes (workflows), and business intelligence (Balanced Scorecard) software. It was created as a supporting tool for teaching IE, its purpose is to give…
wft4galaxy: a workflow testing tool for galaxy.
Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi
2017-12-01
Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...
2015-07-14
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias
2011-03-21
Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users “bundles of features†consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.
Quantitative workflow based on NN for weighting criteria in landfill suitability mapping
NASA Astrophysics Data System (ADS)
Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Alkhasawneh, Mutasem Sh.; Aziz, Hamidi Abdul
2017-10-01
Our study aims to introduce a new quantitative workflow that integrates neural networks (NNs) and multi criteria decision analysis (MCDA). Existing MCDA workflows reveal a number of drawbacks, because of the reliance on human knowledge in the weighting stage. Thus, new workflow presented to form suitability maps at the regional scale for solid waste planning based on NNs. A feed-forward neural network employed in the workflow. A total of 34 criteria were pre-processed to establish the input dataset for NN modelling. The final learned network used to acquire the weights of the criteria. Accuracies of 95.2% and 93.2% achieved for the training dataset and testing dataset, respectively. The workflow was found to be capable of reducing human interference to generate highly reliable maps. The proposed workflow reveals the applicability of NN in generating landfill suitability maps and the feasibility of integrating them with existing MCDA workflows.
A scientific workflow framework for (13)C metabolic flux analysis.
Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina
2016-08-20
Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
MetaNET--a web-accessible interactive platform for biological metabolic network analysis.
Narang, Pankaj; Khan, Shawez; Hemrom, Anmol Jaywant; Lynn, Andrew Michael
2014-01-01
Metabolic reactions have been extensively studied and compiled over the last century. These have provided a theoretical base to implement models, simulations of which are used to identify drug targets and optimize metabolic throughput at a systemic level. While tools for the perturbation of metabolic networks are available, their applications are limited and restricted as they require varied dependencies and often a commercial platform for full functionality. We have developed MetaNET, an open source user-friendly platform-independent and web-accessible resource consisting of several pre-defined workflows for metabolic network analysis. MetaNET is a web-accessible platform that incorporates a range of functions which can be combined to produce different simulations related to metabolic networks. These include (i) optimization of an objective function for wild type strain, gene/catalyst/reaction knock-out/knock-down analysis using flux balance analysis. (ii) flux variability analysis (iii) chemical species participation (iv) cycles and extreme paths identification and (v) choke point reaction analysis to facilitate identification of potential drug targets. The platform is built using custom scripts along with the open-source Galaxy workflow and Systems Biology Research Tool as components. Pre-defined workflows are available for common processes, and an exhaustive list of over 50 functions are provided for user defined workflows. MetaNET, available at http://metanet.osdd.net , provides a user-friendly rich interface allowing the analysis of genome-scale metabolic networks under various genetic and environmental conditions. The framework permits the storage of previous results, the ability to repeat analysis and share results with other users over the internet as well as run different tools simultaneously using pre-defined workflows, and user-created custom workflows.
2009-01-01
Background The rapid advancement of computer and information technology in recent years has resulted in the rise of e-learning technologies to enhance and complement traditional classroom teaching in many fields, including bioinformatics. This paper records the experience of implementing e-learning technology to support problem-based learning (PBL) in the teaching of two undergraduate bioinformatics classes in the National University of Singapore. Results Survey results further established the efficiency and suitability of e-learning tools to supplement PBL in bioinformatics education. 63.16% of year three bioinformatics students showed a positive response regarding the usefulness of the Learning Activity Management System (LAMS) e-learning tool in guiding the learning and discussion process involved in PBL and in enhancing the learning experience by breaking down PBL activities into a sequential workflow. On the other hand, 89.81% of year two bioinformatics students indicated that their revision process was positively impacted with the use of LAMS for guiding the learning process, while 60.19% agreed that the breakdown of activities into a sequential step-by-step workflow by LAMS enhances the learning experience Conclusion We show that e-learning tools are useful for supplementing PBL in bioinformatics education. The results suggest that it is feasible to develop and adopt e-learning tools to supplement a variety of instructional strategies in the future. PMID:19958511
[Digital teaching archive. Concept, implementation, and experiences in a university setting].
Trumm, C; Dugas, M; Wirth, S; Treitl, M; Lucke, A; Küttner, B; Pander, E; Clevert, D-A; Glaser, C; Reiser, M
2005-08-01
Film-based teaching files require a substantial investment in human, logistic, and financial resources. The combination of computer and network technology facilitates the workflow integration of distributing radiologic teaching cases within an institution (intranet) or via the World Wide Web (Internet). A digital teaching file (DTF) should include the following basic functions: image import from different sources and of different formats, editing of imported images, uniform case classification, quality control (peer review), a controlled access of different user groups (in-house and external), and an efficient retrieval strategy. The portable network graphics image format (PNG) is especially suitable for DTFs because of several features: pixel support, 2D-interlacing, gamma correction, and lossless compression. The American College of Radiology (ACR) "Index for Radiological Diagnoses" is hierarchically organized and thus an ideal classification system for a DTF. Computer-based training (CBT) in radiology is described in numerous publications, from supplementing traditional learning methods to certified education via the Internet. Attractiveness of a CBT application can be increased by integration of graphical and interactive elements but makes workflow integration of daily case input more difficult. Our DTF was built with established Internet instruments and integrated into a heterogeneous PACS/RIS environment. It facilitates a quick transfer (DICOM_Send) of selected images at the time of interpretation to the DTF and access to the DTF application at any time anywhere within the university hospital intranet employing a standard web browser. A DTF is a small but important building block in an institutional strategy of knowledge management.
Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan
2015-01-01
The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow. Copyright © 2015 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Embracing the Archives: How NPR Librarians Turned Their Collection into a Workflow Tool
ERIC Educational Resources Information Center
Sin, Lauren; Daugert, Katie
2013-01-01
Several years ago, National Public Radio (NPR) librarians began developing a new content management system (CMS). It was intended to offer desktop access for all NPR-produced content, including transcripts, audio, and metadata. Fast-forward to 2011, and their shiny, new database, Artemis, was ready for debut. Their next challenge: to teach a staff…
Taverna: a tool for building and running workflows of services
Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom
2006-01-01
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108
Web-video-mining-supported workflow modeling for laparoscopic surgeries.
Liu, Rui; Zhang, Xiaoli; Zhang, Hao
2016-11-01
As quality assurance is of strong concern in advanced surgeries, intelligent surgical systems are expected to have knowledge such as the knowledge of the surgical workflow model (SWM) to support their intuitive cooperation with surgeons. For generating a robust and reliable SWM, a large amount of training data is required. However, training data collected by physically recording surgery operations is often limited and data collection is time-consuming and labor-intensive, severely influencing knowledge scalability of the surgical systems. The objective of this research is to solve the knowledge scalability problem in surgical workflow modeling with a low cost and labor efficient way. A novel web-video-mining-supported surgical workflow modeling (webSWM) method is developed. A novel video quality analysis method based on topic analysis and sentiment analysis techniques is developed to select high-quality videos from abundant and noisy web videos. A statistical learning method is then used to build the workflow model based on the selected videos. To test the effectiveness of the webSWM method, 250 web videos were mined to generate a surgical workflow for the robotic cholecystectomy surgery. The generated workflow was evaluated by 4 web-retrieved videos and 4 operation-room-recorded videos, respectively. The evaluation results (video selection consistency n-index ≥0.60; surgical workflow matching degree ≥0.84) proved the effectiveness of the webSWM method in generating robust and reliable SWM knowledge by mining web videos. With the webSWM method, abundant web videos were selected and a reliable SWM was modeled in a short time with low labor cost. Satisfied performances in mining web videos and learning surgery-related knowledge show that the webSWM method is promising in scaling knowledge for intelligent surgical systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz
2016-01-01
As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971
Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz
2017-01-01
As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.
Workflow Management for Complex HEP Analyses
NASA Astrophysics Data System (ADS)
Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.
2017-10-01
We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.
Potential of knowledge discovery using workflows implemented in the C3Grid
NASA Astrophysics Data System (ADS)
Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny
2013-04-01
With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity alteration of surface cyclones. A specific feature of C3Grid is the flexible Workflow Scheduling Service (WSS) which also allows for automated nightly analysis runs of CT, Stormtrack, etc. with different input parameter sets. The statistical results of these workflows can be accumulated afterwards by a scheduled final analysis step, thereby providing a tool for data intensive analytics for the massive amounts of climate model data accessible through C3Grid. First tests with these automated analysis workflows show promising results to speed up the investigation of high volume modeling data. This example is relevant to the thorough analysis of future changes in storminess in Europe and is just one example of the potential of knowledge discovery using automated workflows implemented in the C3Grid architecture.
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.
Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B
2016-02-04
Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Case-oriented computer-based-training in radiology: concept, implementation and evaluation
Dugas, Martin; Trumm, Christoph; Stäbler, Axel; Pander, Ernst; Hundt, Walter; Scheidler, Jurgen; Brüning, Roland; Helmberger, Thomas; Waggershauser, Tobias; Matzko, Matthias; Reiser, Maximillian
2001-01-01
Background Providing high-quality clinical cases is important for teaching radiology. We developed, implemented and evaluated a program for a university hospital to support this task. Methods The system was built with Intranet technology and connected to the Picture Archiving and Communications System (PACS). It contains cases for every user group from students to attendants and is structured according to the ACR-code (American College of Radiology) [2]. Each department member was given an individual account, could gather his teaching cases and put the completed cases into the common database. Results During 18 months 583 cases containing 4136 images involving all radiological techniques were compiled and 350 cases put into the common case repository. Workflow integration as well as individual interest influenced the personal efforts to participate but an increasing number of cases and minor modifications of the program improved user acceptance continuously. 101 students went through an evaluation which showed a high level of acceptance and a special interest in elaborate documentation. Conclusion Electronic access to reference cases for all department members anytime anywhere is feasible. Critical success factors are workflow integration, reliability, efficient retrieval strategies and incentives for case authoring. PMID:11686856
chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery.
Lee, Man-Ling; Aliagas, Ignacio; Feng, Jianwen A; Gabriel, Thomas; O'Donnell, T J; Sellers, Benjamin D; Wiswedel, Bernd; Gobbi, Alberto
2017-06-12
Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and write SD files from stdin and to stdout to be wrapped into KNIME nodes. The combination of chemalot and chemalot_knime not only facilitates the compilation and maintenance of sequences of command line programs but also allows KNIME workflows to take advantage of the compute power of a LINUX cluster. Use of the command line programs is demonstrated in three different workflow examples: (1) A workflow to create a data file with project-relevant data for structure-activity or property analysis and other type of investigations, (2) The creation of a quantitative structure-property-relationship model using the command line programs via KNIME nodes, and (3) The analysis of strain energy in small molecule ligand conformations from the Protein Data Bank database. The chemalot and chemalot_knime packages provide lightweight and powerful tools for many tasks in cheminformatics. They are easily integrated with other open source and commercial command line tools and can be combined to build new and even more powerful tools. The chemalot_knime package facilitates the generation and maintenance of user-defined command line workflows, taking advantage of the graphical design capabilities in KNIME. Graphical abstract Example KNIME workflow with chemalot nodes and the corresponding command line pipe.
NASA Astrophysics Data System (ADS)
Augustine, Kurt E.; Holmes, David R., III; Hanson, Dennis P.; Robb, Richard A.
2006-03-01
One of the greatest challenges for a software engineer is to create a complex application that is comprehensive enough to be useful to a diverse set of users, yet focused enough for individual tasks to be carried out efficiently with minimal training. This "powerful yet simple" paradox is particularly prevalent in advanced medical imaging applications. Recent research in the Biomedical Imaging Resource (BIR) at Mayo Clinic has been directed toward development of an imaging application framework that provides powerful image visualization/analysis tools in an intuitive, easy-to-use interface. It is based on two concepts very familiar to physicians - Cases and Workflows. Each case is associated with a unique patient and a specific set of routine clinical tasks, or a workflow. Each workflow is comprised of an ordered set of general-purpose modules which can be re-used for each unique workflow. Clinicians help describe and design the workflows, and then are provided with an intuitive interface to both patient data and analysis tools. Since most of the individual steps are common to many different workflows, the use of general-purpose modules reduces development time and results in applications that are consistent, stable, and robust. While the development of individual modules may reflect years of research by imaging scientists, new customized workflows based on the new modules can be developed extremely fast. If a powerful, comprehensive application is difficult to learn and complicated to use, it will be unacceptable to most clinicians. Clinical image analysis tools must be intuitive and effective or they simply will not be used.
Integrated workflows for spiking neuronal network simulations
Antolík, Ján; Davison, Andrew P.
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Integrated workflows for spiking neuronal network simulations.
Antolík, Ján; Davison, Andrew P
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Nyström, Pär; Falck-Ytter, Terje; Gredebäck, Gustaf
2016-06-01
This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.
A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data
Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may levera...
Feliubadaló, Lídia; Lopez-Doriga, Adriana; Castellsagué, Ester; del Valle, Jesús; Menéndez, Mireia; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Gómez, Carolina; Campos, Olga; Pineda, Marta; González, Sara; Moreno, Victor; Brunet, Joan; Blanco, Ignacio; Serra, Eduard; Capellá, Gabriel; Lázaro, Conxi
2013-01-01
Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches. PMID:23249957
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.
2013-12-01
The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA.
Callahan, Benjamin; Proctor, Diana; Relman, David; Fukuyama, Julia; Holmes, Susan
2016-01-01
This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. We propose its replacement with reproducible and documented analysis using R packages dada2, knitr, and phyloseq. This workflow implements both key stages of amplicon analysis: the initial filtering and denoising steps needed to construct taxonomic feature tables from error-containing sequencing reads (dada2), and the exploratory and inferential analysis of those feature tables and associated sample metadata (phyloseq). This workow facilitates reproducible interrogation of the full set of choices required in microbiome studies. We present several examples in which we leverage existing packages for analysis in a way that allows easy sharing and modification by others, and give pointers to articles that depend on this reproducible workflow for the study of longitudinal and spatial series analyses of the vaginal microbiome in pregnancy and the oral microbiome in humans with healthy dentition and intra-oral tissues.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines
Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu
Graphical abstract: - Highlights: • An automated workflow to optimize force-field parameters. • Used the workflow to optimize force-field parameter for a system containing nanodiamond and tRNA. • The mechanism relies on molecular dynamics simulation and neutron scattering experimental data. • The workflow can be generalized to any other experimental and simulation techniques. - Abstract: Large-scale simulations and data analysis are often required to explain neutron scattering experiments to establish a connection between the fundamental physics at the nanoscale and data probed by neutrons. However, to perform simulations at experimental conditions it is critical to use correct force-field (FF) parametersmore » which are unfortunately not available for most complex experimental systems. In this work, we have developed a workflow optimization technique to provide optimized FF parameters by comparing molecular dynamics (MD) to neutron scattering data. We describe the workflow in detail by using an example system consisting of tRNA and hydrophilic nanodiamonds in a deuterated water (D{sub 2}O) environment. Quasi-elastic neutron scattering (QENS) data show a faster motion of the tRNA in the presence of nanodiamond than without the ND. To compare the QENS and MD results quantitatively, a proper choice of FF parameters is necessary. We use an efficient workflow to optimize the FF parameters between the hydrophilic nanodiamond and water by comparing to the QENS data. Our results show that we can obtain accurate FF parameters by using this technique. The workflow can be generalized to other types of neutron data for FF optimization, such as vibrational spectroscopy and spin echo.« less
Flexible workflow sharing and execution services for e-scientists
NASA Astrophysics Data System (ADS)
Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely
2013-04-01
The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
Mirel, Barbara; Luo, Airong; Harris, Marcelline
2015-05-01
Collaborative research has many challenges. One under-researched challenge is how to align collaborators' research practices and evolving analytical reasoning with technologies and configurations of technologies that best support them. The goal of such alignment is to enhance collaborative problem solving capabilities in research. Toward this end, we draw on our own research and a synthesis of the literature to characterize the workflow of collaborating scientists in systems-level renal disease research. We describe the various phases of a hypothetical workflow among diverse collaborators within and across laboratories, extending from their primary analysis through secondary analysis. For each phase, we highlight required technology supports, and. At time, complementary organizational supports. This survey of supports matching collaborators' analysis practices and needs in research projects to technological support is preliminary, aimed ultimately at developing a research capability framework that can help scientists and technologists mutually understand workflows and technologies that can help enable and enhance them. Copyright © 2015 Elsevier Inc. All rights reserved.
myExperiment: a repository and social network for the sharing of bioinformatics workflows
Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David
2010-01-01
myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org. PMID:20501605
MetaboTools: A comprehensive toolbox for analysis of genome-scale metabolic models
Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines
2016-08-03
Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows
Pugmire, David; Kress, James; Choi, Jong; ...
2016-08-04
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
Sreedharan, Vipin T; Schultheiss, Sebastian J; Jean, Géraldine; Kahles, André; Bohnert, Regina; Drewe, Philipp; Mudrakarta, Pramod; Görnitz, Nico; Zeller, Georg; Rätsch, Gunnar
2014-05-01
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Davidson, Robert L; Weber, Ralf J M; Liu, Haoyu; Sharma-Oates, Archana; Viant, Mark R
2016-01-01
Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.
An architecture model for multiple disease management information systems.
Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei
2013-04-01
Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.
A Workflow to Improve the Alignment of Prostate Imaging with Whole-mount Histopathology.
Yamamoto, Hidekazu; Nir, Dror; Vyas, Lona; Chang, Richard T; Popert, Rick; Cahill, Declan; Challacombe, Ben; Dasgupta, Prokar; Chandra, Ashish
2014-08-01
Evaluation of prostate imaging tests against whole-mount histology specimens requires accurate alignment between radiologic and histologic data sets. Misalignment results in false-positive and -negative zones as assessed by imaging. We describe a workflow for three-dimensional alignment of prostate imaging data against whole-mount prostatectomy reference specimens and assess its performance against a standard workflow. Ethical approval was granted. Patients underwent motorized transrectal ultrasound (Prostate Histoscanning) to generate a three-dimensional image of the prostate before radical prostatectomy. The test workflow incorporated steps for axial alignment between imaging and histology, size adjustments following formalin fixation, and use of custom-made parallel cutters and digital caliper instruments. The control workflow comprised freehand cutting and assumed homogeneous block thicknesses at the same relative angles between pathology and imaging sections. Thirty radical prostatectomy specimens were histologically and radiologically processed, either by an alignment-optimized workflow (n = 20) or a control workflow (n = 10). The optimized workflow generated tissue blocks of heterogeneous thicknesses but with no significant drifting in the cutting plane. The control workflow resulted in significantly nonparallel blocks, accurately matching only one out of four histology blocks to their respective imaging data. The image-to-histology alignment accuracy was 20% greater in the optimized workflow (P < .0001), with higher sensitivity (85% vs. 69%) and specificity (94% vs. 73%) for margin prediction in a 5 × 5-mm grid analysis. A significantly better alignment was observed in the optimized workflow. Evaluation of prostate imaging biomarkers using whole-mount histology references should include a test-to-reference spatial alignment workflow. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Observing System Simulation Experiment (OSSE) for the HyspIRI Spectrometer Mission
NASA Technical Reports Server (NTRS)
Turmon, Michael J.; Block, Gary L.; Green, Robert O.; Hua, Hook; Jacob, Joseph C.; Sobel, Harold R.; Springer, Paul L.; Zhang, Qingyuan
2010-01-01
The OSSE software provides an integrated end-to-end environment to simulate an Earth observing system by iteratively running a distributed modeling workflow based on the HyspIRI Mission, including atmospheric radiative transfer, surface albedo effects, detection, and retrieval for agile exploration of the mission design space. The software enables an Observing System Simulation Experiment (OSSE) and can be used for design trade space exploration of science return for proposed instruments by modeling the whole ground truth, sensing, and retrieval chain and to assess retrieval accuracy for a particular instrument and algorithm design. The OSSE in fra struc ture is extensible to future National Research Council (NRC) Decadal Survey concept missions where integrated modeling can improve the fidelity of coupled science and engineering analyses for systematic analysis and science return studies. This software has a distributed architecture that gives it a distinct advantage over other similar efforts. The workflow modeling components are typically legacy computer programs implemented in a variety of programming languages, including MATLAB, Excel, and FORTRAN. Integration of these diverse components is difficult and time-consuming. In order to hide this complexity, each modeling component is wrapped as a Web Service, and each component is able to pass analysis parameterizations, such as reflectance or radiance spectra, on to the next component downstream in the service workflow chain. In this way, the interface to each modeling component becomes uniform and the entire end-to-end workflow can be run using any existing or custom workflow processing engine. The architecture lets users extend workflows as new modeling components become available, chain together the components using any existing or custom workflow processing engine, and distribute them across any Internet-accessible Web Service endpoints. The workflow components can be hosted on any Internet-accessible machine. This has the advantages that the computations can be distributed to make best use of the available computing resources, and each workflow component can be hosted and maintained by their respective domain experts.
AstroGrid: Taverna in the Virtual Observatory .
NASA Astrophysics Data System (ADS)
Benson, K. M.; Walton, N. A.
This paper reports on the implementation of the Taverna workbench by AstroGrid, a tool for designing and executing workflows of tasks in the Virtual Observatory. The workflow approach helps astronomers perform complex task sequences with little technical effort. Visual approach to workflow construction streamlines highly complex analysis over public and private data and uses computational resources as minimal as a desktop computer. Some integration issues and future work are discussed in this article.
NASA Astrophysics Data System (ADS)
Ferreira da Silva, R.; Filgueira, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present Asterism, an open source data-intensive framework, which combines the Pegasus and dispel4py workflow systems. Asterism aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. Asterism's key element is to leverage the strengths of each workflow system: dispel4py allows developing scientific applications locally and then automatically parallelize and scale them on a wide range of HPC infrastructures with no changes to the application's code; Pegasus orchestrates the distributed execution of applications while providing portability, automated data management, recovery, debugging, and monitoring, without users needing to worry about the particulars of the target execution systems. Asterism leverages the level of abstractions provided by each workflow system to describe hybrid workflows where no information about the underlying infrastructure is required beforehand. The feasibility of Asterism has been evaluated using the seismic ambient noise cross-correlation application, a common data-intensive analysis pattern used by many seismologists. The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The Asterism workflow is implemented as a Pegasus workflow composed of two tasks (Phase1 and Phase2), where each phase represents a dispel4py workflow. Pegasus tasks describe the in/output data at a logical level, the data dependency between tasks, and the e-Infrastructures and the execution engine to run each dispel4py workflow. We have instantiated the workflow using data from 1000 stations from the IRIS services, and run it across two heterogeneous resources described as Docker containers: MPI (Container2) and Storm (Container3) clusters (Figure 1). Each dispel4py workflow is mapped to a particular execution engine, and data transfers between resources are automatically handled by Pegasus. Asterism is freely available online at http://github.com/dispel4py/pegasus_dispel4py.
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics
Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe
2015-01-01
Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.
Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe
2015-05-01
The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Wucherl; Koo, Michelle; Cao, Yu
Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
2016-06-01
unlimited. v List of Tables Table 1 Single-lap-joint experimental parameters ..............................................7 Table 2 Survey ...Joints: Experimental and Workflow Protocols by Robert E Jensen, Daniel C DeSchepper, and David P Flanagan Approved for...TR-7696 ● JUNE 2016 US Army Research Laboratory Multivariate Analysis of High Through-Put Adhesively Bonded Single Lap Joints: Experimental
CMS Configuration Editor: GUI based application for user analysis job
NASA Astrophysics Data System (ADS)
de Cosa, A.
2011-12-01
We present the user interface and the software architecture of the Configuration Editor for the CMS experiment. The analysis workflow is organized in a modular way integrated within the CMS framework that organizes in a flexible way user analysis code. The Python scripting language is adopted to define the job configuration that drives the analysis workflow. It could be a challenging task for users, especially for newcomers, to develop analysis jobs managing the configuration of many required modules. For this reason a graphical tool has been conceived in order to edit and inspect configuration files. A set of common analysis tools defined in the CMS Physics Analysis Toolkit (PAT) can be steered and configured using the Config Editor. A user-defined analysis workflow can be produced starting from a standard configuration file, applying and configuring PAT tools according to the specific user requirements. CMS users can adopt this tool, the Config Editor, to create their analysis visualizing in real time which are the effects of their actions. They can visualize the structure of their configuration, look at the modules included in the workflow, inspect the dependences existing among the modules and check the data flow. They can visualize at which values parameters are set and change them according to what is required by their analysis task. The integration of common tools in the GUI needed to adopt an object-oriented structure in the Python definition of the PAT tools and the definition of a layer of abstraction from which all PAT tools inherit.
Using lean methodology to improve productivity in a hospital oncology pharmacy.
Sullivan, Peter; Soefje, Scott; Reinhart, David; McGeary, Catherine; Cabie, Eric D
2014-09-01
Quality improvements achieved by a hospital pharmacy through the use of lean methodology to guide i.v. compounding workflow changes are described. The outpatient oncology pharmacy of Yale-New Haven Hospital conducted a quality-improvement initiative to identify and implement workflow changes to support a major expansion of chemotherapy services. Applying concepts of lean methodology (i.e., elimination of non-value-added steps and waste in the production process), the pharmacy team performed a failure mode and effects analysis, workflow mapping, and impact analysis; staff pharmacists and pharmacy technicians identified 38 opportunities to decrease waste and increase efficiency. Three workflow processes (order verification, compounding, and delivery) accounted for 24 of 38 recommendations and were targeted for lean process improvements. The workflow was decreased to 14 steps, eliminating 6 non-value-added steps, and pharmacy staff resources and schedules were realigned with the streamlined workflow. The time required for pharmacist verification of patient-specific oncology orders was decreased by 33%; the time required for product verification was decreased by 52%. The average medication delivery time was decreased by 47%. The results of baseline and postimplementation time trials indicated a decrease in overall turnaround time to about 70 minutes, compared with a baseline time of about 90 minutes. The use of lean methodology to identify non-value-added steps in oncology order processing and the implementation of staff-recommended workflow changes resulted in an overall reduction in the turnaround time per dose. Copyright © 2014 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Suftin, I.; Read, J. S.; Walker, J.
2013-12-01
Scientists prefer not having to be tied down to a specific machine or operating system in order to analyze local and remote data sets or publish work. Increasingly, analysis has been migrating to decentralized web services and data sets, using web clients to provide the analysis interface. While simplifying workflow access, analysis, and publishing of data, the move does bring with it its own unique set of issues. Web clients used for analysis typically offer workflows geared towards a single user, with steps and results that are often difficult to recreate and share with others. Furthermore, workflow results often may not be easily used as input for further analysis. Older browsers further complicate things by having no way to maintain larger chunks of information, often offloading the job of storage to the back-end server or trying to squeeze it into a cookie. It has been difficult to provide a concept of "session storage" or "workflow sharing" without a complex orchestration of the back-end for storage depending on either a centralized file system or database. With the advent of HTML5, browsers gained the ability to store more information through the use of the Web Storage API (a browser-cookie holds a maximum of 4 kilobytes). Web Storage gives us the ability to store megabytes of arbitrary data in-browser either with an expiration date or just for a session. This allows scientists to create, update, persist and share their workflow without depending on the backend to store session information, providing the flexibility for new web-based workflows to emerge. In the DSASWeb portal ( http://cida.usgs.gov/DSASweb/ ), using these techniques, the representation of every step in the analyst's workflow is stored as plain-text serialized JSON, which we can generate as a text file and provide to the analyst as an upload. This file may then be shared with others and loaded back into the application, restoring the application to the state it was in when the session file was generated. A user may then view results produced during that session or go back and alter input parameters, creating new results and producing new, unique sessions which they can then again share. This technique not only provides independence for the user to manage their session as they like, but also allows much greater freedom for the application provider to scale out without having to worry about carrying over user information or maintaining it in a central location.
Explore Care Pathways of Colorectal Cancer Patients with Social Network Analysis.
Huo, Tianyao; George, Thomas J; Guo, Yi; He, Zhe; Prosperi, Mattia; Modave, François; Bian, Jiang
2017-01-01
Patients with colorectal cancer (CRC) often face treatment delays and the exact reasons have not been well studied. This study is to explore clinical workflow patterns for CRC patients using electronic health records (EHR). In particular, we modeled the clinical workflow (provider-provider interactions) of a CRC patient's workup period as a social network, and identified clusters of workflow patterns based on network characteristics. Understanding of these patterns will help guide healthcare policy-making and practice.
Task-technology fit of video telehealth for nurses in an outpatient clinic setting.
Cady, Rhonda G; Finkelstein, Stanley M
2014-07-01
Incorporating telehealth into outpatient care delivery supports management of consumer health between clinic visits. Task-technology fit is a framework for understanding how technology helps and/or hinders a person during work processes. Evaluating the task-technology fit of video telehealth for personnel working in a pediatric outpatient clinic and providing care between clinic visits ensures the information provided matches the information needed to support work processes. The workflow of advanced practice registered nurse (APRN) care coordination provided via telephone and video telehealth was described and measured using a mixed-methods workflow analysis protocol that incorporated cognitive ethnography and time-motion study. Qualitative and quantitative results were merged and analyzed within the task-technology fit framework to determine the workflow fit of video telehealth for APRN care coordination. Incorporating video telehealth into APRN care coordination workflow provided visual information unavailable during telephone interactions. Despite additional tasks and interactions needed to obtain the visual information, APRN workflow efficiency, as measured by time, was not significantly changed. Analyzed within the task-technology fit framework, the increased visual information afforded by video telehealth supported the assessment and diagnostic information needs of the APRN. Telehealth must provide the right information to the right clinician at the right time. Evaluating task-technology fit using a mixed-methods protocol ensured rigorous analysis of fit within work processes and identified workflows that benefit most from the technology.
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API.
Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry
2015-01-01
As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST's API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014.
Efficient Workflows for Curation of Heterogeneous Data Supporting Modeling of U-Nb Alloy Aging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ward, Logan Timothy; Hackenberg, Robert Errol
These are slides from a presentation summarizing a graduate research associate's summer project. The following topics are covered in these slides: data challenges in materials, aging in U-Nb Alloys, Building an Aging Model, Different Phase Trans. in U-Nb, the Challenge, Storing Materials Data, Example Data Source, Organizing Data: What is a Schema?, What does a "XML Schema" look like?, Our Data Schema: Nice and Simple, Storing Data: Materials Data Curation System (MDCS), Problem with MDCS: Slow Data Entry, Getting Literature into MDCS, Staging Data in Excel Document, Final Result: MDCS Records, Analyzing Image Data, Process for Making TTT Diagram, Bottleneckmore » Number 1: Image Analysis, Fitting a TTP Boundary, Fitting a TTP Curve: Comparable Results, How Does it Compare to Our Data?, Image Analysis Workflow, Curating Hardness Records, Hardness Data: Two Key Decisions, Before Peak Age? - Automation, Interactive Viz, Which Transformation?, Microstructure-Informed Model, Tracking the Entire Process, General Problem with Property Models, Pinyon: Toolkit for Managing Model Creation, Tracking Individual Decisions, Jupyter: Docs and Code in One File, Hardness Analysis Workflow, Workflow for Aging Models, and conclusions.« less
Scalable and cost-effective NGS genotyping in the cloud.
Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P
2015-10-15
While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API
Ono, Keiichiro; Muetze, Tanja; Kolishovski, Georgi; Shannon, Paul; Demchak, Barry
2015-01-01
As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST’s API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014. PMID:26672762
Structuring clinical workflows for diabetes care: an overview of the OntoHealth approach.
Schweitzer, M; Lasierra, N; Oberbichler, S; Toma, I; Fensel, A; Hoerbst, A
2014-01-01
Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view.
Structuring Clinical Workflows for Diabetes Care
Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.
2014-01-01
Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765
Muirhead, David; Aoun, Patricia; Powell, Michael; Juncker, Flemming; Mollerup, Jens
2010-08-01
The need for higher efficiency, maximum quality, and faster turnaround time is a continuous focus for anatomic pathology laboratories and drives changes in work scheduling, instrumentation, and management control systems. To determine the costs of generating routine, special, and immunohistochemical microscopic slides in a large, academic anatomic pathology laboratory using a top-down approach. The Pathology Economic Model Tool was used to analyze workflow processes at The Nebraska Medical Center's anatomic pathology laboratory. Data from the analysis were used to generate complete cost estimates, which included not only materials, consumables, and instrumentation but also specific labor and overhead components for each of the laboratory's subareas. The cost data generated by the Pathology Economic Model Tool were compared with the cost estimates generated using relative value units. Despite the use of automated systems for different processes, the workflow in the laboratory was found to be relatively labor intensive. The effect of labor and overhead on per-slide costs was significantly underestimated by traditional relative-value unit calculations when compared with the Pathology Economic Model Tool. Specific workflow defects with significant contributions to the cost per slide were identified. The cost of providing routine, special, and immunohistochemical slides may be significantly underestimated by traditional methods that rely on relative value units. Furthermore, a comprehensive analysis may identify specific workflow processes requiring improvement.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience
Stockton, David B.; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
Application of Six Sigma methodology to a diagnostic imaging process.
Taner, Mehmet Tolga; Sezen, Bulent; Atwat, Kamal M
2012-01-01
This paper aims to apply the Six Sigma methodology to improve workflow by eliminating the causes of failure in the medical imaging department of a private Turkish hospital. Implementation of the design, measure, analyse, improve and control (DMAIC) improvement cycle, workflow chart, fishbone diagrams and Pareto charts were employed, together with rigorous data collection in the department. The identification of root causes of repeat sessions and delays was followed by failure, mode and effect analysis, hazard analysis and decision tree analysis. The most frequent causes of failure were malfunction of the RIS/PACS system and improper positioning of patients. Subsequent to extensive training of professionals, the sigma level was increased from 3.5 to 4.2. The data were collected over only four months. Six Sigma's data measurement and process improvement methodology is the impetus for health care organisations to rethink their workflow and reduce malpractice. It involves measuring, recording and reporting data on a regular basis. This enables the administration to monitor workflow continuously. The improvements in the workflow under study, made by determining the failures and potential risks associated with radiologic care, will have a positive impact on society in terms of patient safety. Having eliminated repeat examinations, the risk of being exposed to more radiation was also minimised. This paper supports the need to apply Six Sigma and present an evaluation of the process in an imaging department.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.
Stockton, David B; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
NASA Astrophysics Data System (ADS)
Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.
2002-05-01
The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.
NASA Astrophysics Data System (ADS)
Liu, Jia; Liu, Longli; Xue, Yong; Dong, Jing; Hu, Yingcui; Hill, Richard; Guang, Jie; Li, Chi
2017-01-01
Workflow for remote sensing quantitative retrieval is the ;bridge; between Grid services and Grid-enabled application of remote sensing quantitative retrieval. Workflow averts low-level implementation details of the Grid and hence enables users to focus on higher levels of application. The workflow for remote sensing quantitative retrieval plays an important role in remote sensing Grid and Cloud computing services, which can support the modelling, construction and implementation of large-scale complicated applications of remote sensing science. The validation of workflow is important in order to support the large-scale sophisticated scientific computation processes with enhanced performance and to minimize potential waste of time and resources. To research the semantic correctness of user-defined workflows, in this paper, we propose a workflow validation method based on tacit knowledge research in the remote sensing domain. We first discuss the remote sensing model and metadata. Through detailed analysis, we then discuss the method of extracting the domain tacit knowledge and expressing the knowledge with ontology. Additionally, we construct the domain ontology with Protégé. Through our experimental study, we verify the validity of this method in two ways, namely data source consistency error validation and parameters matching error validation.
The equivalency between logic Petri workflow nets and workflow nets.
Wang, Jing; Yu, ShuXia; Du, YuYue
2015-01-01
Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented.
The Equivalency between Logic Petri Workflow Nets and Workflow Nets
Wang, Jing; Yu, ShuXia; Du, YuYue
2015-01-01
Logic Petri nets (LPNs) can describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. Logic Petri workflow nets (LPWNs) are proposed based on LPNs in this paper. Process mining is regarded as an important bridge between modeling and analysis of data mining and business process. Workflow nets (WF-nets) are the extension to Petri nets (PNs), and have successfully been used to process mining. Some shortcomings cannot be avoided in process mining, such as duplicate tasks, invisible tasks, and the noise of logs. The online shop in electronic commerce in this paper is modeled to prove the equivalence between LPWNs and WF-nets, and advantages of LPWNs are presented. PMID:25821845
Overcoming Barriers to Technology Adoption in Small Manufacturing Enterprises (SMEs)
2003-06-01
automates quote-generation, order - processing workflow management, perform- ance analysis, and accounting functions. Ultimately, it will enable Magdic...that Magdic imple- ment an MES instead. The MES, in addition to solving the problem of document manage- ment, would automate quote-generation, order ... processing , workflow management, perform- ance analysis, and accounting functions. To help Magdic personnel learn about the MES, TIDE personnel provided
Scientific Workflow Management in Proteomics
de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus
2012-01-01
Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Solutions for Mining Distributed Scientific Data
NASA Astrophysics Data System (ADS)
Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.
2007-12-01
Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
Rafiei, Atefeh; Sleno, Lekha
2015-01-15
Data analysis is a key step in mass spectrometry based untargeted metabolomics, starting with the generation of generic peak lists from raw liquid chromatography/mass spectrometry (LC/MS) data. Due to the use of various algorithms by different workflows, the results of different peak-picking strategies often differ widely. Raw LC/HRMS data from two types of biological samples (bile and urine), as well as a standard mixture of 84 metabolites, were processed with four peak-picking softwares: Peakview®, Markerview™, MetabolitePilot™ and XCMS Online. The overlaps between the results of each peak-generating method were then investigated. To gauge the relevance of peak lists, a database search using the METLIN online database was performed to determine which features had accurate masses matching known metabolites as well as a secondary filtering based on MS/MS spectral matching. In this study, only a small proportion of all peaks (less than 10%) were common to all four software programs. Comparison of database searching results showed peaks found uniquely by one workflow have less chance of being found in the METLIN metabolomics database and are even less likely to be confirmed by MS/MS. It was shown that the performance of peak-generating workflows has a direct impact on untargeted metabolomics results. As it was demonstrated that the peaks found in more than one peak detection workflow have higher potential to be identified by accurate mass as well as MS/MS spectrum matching, it is suggested to use the overlap of different peak-picking workflows as preliminary peak lists for more rugged statistical analysis in global metabolomics investigations. Copyright © 2014 John Wiley & Sons, Ltd.
A framework for service enterprise workflow simulation with multi-agents cooperation
NASA Astrophysics Data System (ADS)
Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun
2013-11-01
Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.
Loehfelm, Thomas W; Prater, Adam B; Debebe, Tequam; Sekhar, Aarti K
2017-02-01
We digitized the radiography teaching file at Black Lion Hospital (Addis Ababa, Ethiopia) during a recent trip, using a standard digital camera and a fluorescent light box. Our goal was to photograph every radiograph in the existing library while optimizing the final image size to the maximum resolution of a high quality tablet computer, preserving the contrast resolution of the radiographs, and minimizing total library file size. A secondary important goal was to minimize the cost and time required to take and process the images. Three workers were able to efficiently remove the radiographs from their storage folders, hang them on the light box, operate the camera, catalog the image, and repack the radiographs back to the storage folder. Zoom, focal length, and film speed were fixed, while aperture and shutter speed were manually adjusted for each image, allowing for efficiency and flexibility in image acquisition. Keeping zoom and focal length fixed, which kept the view box at the same relative position in all of the images acquired during a single photography session, allowed unused space to be batch-cropped, saving considerable time in post-processing, at the expense of final image resolution. We present an analysis of the trade-offs in workflow efficiency and final image quality, and demonstrate that a few people with minimal equipment can efficiently digitize a teaching file library.
Metaworkflows and Workflow Interoperability for Heliophysics
NASA Astrophysics Data System (ADS)
Pierantoni, Gabriele; Carley, Eoin P.
2014-06-01
Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They- implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WSPGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
Wang, Yang; Feng, Ruibing; He, Chengwei; Su, Huanxing; Ma, Huan; Wan, Jian-Bo
2018-08-05
The narrow linear range and the limited scan time of the given ion make the quantification of the features challenging in liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics with the full-scan mode. And metabolite identification is another bottleneck of untargeted analysis owing to the difficulty of acquiring MS/MS information of most metabolites detected. In this study, an integrated workflow was proposed using the newly established multiple ion monitoring mode with time-staggered ion lists (tsMIM) and target-directed data-dependent acquisition with time-staggered ion lists (tsDDA) to improve data acquisition and metabolite identification in UHPLC/Q-TOF MS-based untargeted metabolomics. Compared to the conventional untargeted metabolomics, the proprosed workflow exhibited the better repeatability before and after data normalization. After selecting features with the significant change by statistical analysis, MS/MS information of all these features can be obtained by tsDDA analysis to facilitate metabolite identification. Using time-staggered ion lists, the workflow is more sensitive in data acquisition, especially for the low-abundant features. Moreover, the metabolites with low abundance tend to be wrongly integrated and triggered by full scan-based untargeted analysis with MS E acquisition mode, which can be greatly improved by the proposed workflow. The integrated workflow was also successfully applied to discover serum biosignatures for the genetic modification of fat-1 in mice, which indicated its practicability and great potential in future metabolomics studies. Copyright © 2018 Elsevier B.V. All rights reserved.
Modeling workflow to design machine translation applications for public health practice
Turner, Anne M.; Brownstein, Megumu K.; Cole, Kate; Karasz, Hilary; Kirchhoff, Katrin
2014-01-01
Objective Provide a detailed understanding of the information workflow processes related to translating health promotion materials for limited English proficiency individuals in order to inform the design of context-driven machine translation (MT) tools for public health (PH). Materials and Methods We applied a cognitive work analysis framework to investigate the translation information workflow processes of two large health departments in Washington State. Researchers conducted interviews, performed a task analysis, and validated results with PH professionals to model translation workflow and identify functional requirements for a translation system for PH. Results The study resulted in a detailed description of work related to translation of PH materials, an information workflow diagram, and a description of attitudes towards MT technology. We identified a number of themes that hold design implications for incorporating MT in PH translation practice. A PH translation tool prototype was designed based on these findings. Discussion This study underscores the importance of understanding the work context and information workflow for which systems will be designed. Based on themes and translation information workflow processes, we identified key design guidelines for incorporating MT into PH translation work. Primary amongst these is that MT should be followed by human review for translations to be of high quality and for the technology to be adopted into practice. Counclusion The time and costs of creating multilingual health promotion materials are barriers to translation. PH personnel were interested in MT's potential to improve access to low-cost translated PH materials, but expressed concerns about ensuring quality. We outline design considerations and a potential machine translation tool to best fit MT systems into PH practice. PMID:25445922
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guan, Qiang
At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows willmore » compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.« less
Describing and Modeling Workflow and Information Flow in Chronic Disease Care
Unertl, Kim M.; Weinger, Matthew B.; Johnson, Kevin B.; Lorenzi, Nancy M.
2009-01-01
Objectives The goal of the study was to develop an in-depth understanding of work practices, workflow, and information flow in chronic disease care, to facilitate development of context-appropriate informatics tools. Design The study was conducted over a 10-month period in three ambulatory clinics providing chronic disease care. The authors iteratively collected data using direct observation and semi-structured interviews. Measurements The authors observed all aspects of care in three different chronic disease clinics for over 150 hours, including 157 patient-provider interactions. Observation focused on interactions among people, processes, and technology. Observation data were analyzed through an open coding approach. The authors then developed models of workflow and information flow using Hierarchical Task Analysis and Soft Systems Methodology. The authors also conducted nine semi-structured interviews to confirm and refine the models. Results The study had three primary outcomes: models of workflow for each clinic, models of information flow for each clinic, and an in-depth description of work practices and the role of health information technology (HIT) in the clinics. The authors identified gaps between the existing HIT functionality and the needs of chronic disease providers. Conclusions In response to the analysis of workflow and information flow, the authors developed ten guidelines for design of HIT to support chronic disease care, including recommendations to pursue modular approaches to design that would support disease-specific needs. The study demonstrates the importance of evaluating workflow and information flow in HIT design and implementation. PMID:19717802
Task–Technology Fit of Video Telehealth for Nurses in an Outpatient Clinic Setting
Finkelstein, Stanley M.
2014-01-01
Abstract Background: Incorporating telehealth into outpatient care delivery supports management of consumer health between clinic visits. Task–technology fit is a framework for understanding how technology helps and/or hinders a person during work processes. Evaluating the task–technology fit of video telehealth for personnel working in a pediatric outpatient clinic and providing care between clinic visits ensures the information provided matches the information needed to support work processes. Materials and Methods: The workflow of advanced practice registered nurse (APRN) care coordination provided via telephone and video telehealth was described and measured using a mixed-methods workflow analysis protocol that incorporated cognitive ethnography and time–motion study. Qualitative and quantitative results were merged and analyzed within the task–technology fit framework to determine the workflow fit of video telehealth for APRN care coordination. Results: Incorporating video telehealth into APRN care coordination workflow provided visual information unavailable during telephone interactions. Despite additional tasks and interactions needed to obtain the visual information, APRN workflow efficiency, as measured by time, was not significantly changed. Analyzed within the task–technology fit framework, the increased visual information afforded by video telehealth supported the assessment and diagnostic information needs of the APRN. Conclusions: Telehealth must provide the right information to the right clinician at the right time. Evaluating task–technology fit using a mixed-methods protocol ensured rigorous analysis of fit within work processes and identified workflows that benefit most from the technology. PMID:24841219
Closha: bioinformatics workflow system for the analysis of massive sequencing data.
Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook
2018-02-19
While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B
2008-08-07
There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B
2008-01-01
Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127
Automatic Image Processing Workflow for the Keck/NIRC2 Vortex Coronagraph
NASA Astrophysics Data System (ADS)
Xuan, Wenhao; Cook, Therese; Ngo, Henry; Zawol, Zoe; Ruane, Garreth; Mawet, Dimitri
2018-01-01
The Keck/NIRC2 camera, equipped with the vortex coronagraph, is an instrument targeted at the high contrast imaging of extrasolar planets. To uncover a faint planet signal from the overwhelming starlight, we utilize the Vortex Image Processing (VIP) library, which carries out principal component analysis to model and remove the stellar point spread function. To bridge the gap between data acquisition and data reduction, we implement a workflow that 1) downloads, sorts, and processes data with VIP, 2) stores the analysis products into a database, and 3) displays the reduced images, contrast curves, and auxiliary information on a web interface. Both angular differential imaging and reference star differential imaging are implemented in the analysis module. A real-time version of the workflow runs during observations, allowing observers to make educated decisions about time distribution on different targets, hence optimizing science yield. The post-night version performs a standardized reduction after the observation, building up a valuable database that not only helps uncover new discoveries, but also enables a statistical study of the instrument itself. We present the workflow, and an examination of the contrast performance of the NIRC2 vortex with respect to factors including target star properties and observing conditions.
Dynamic reusable workflows for ocean science
Signell, Richard; Fernandez, Filipe; Wilcox, Kyle
2016-01-01
Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.
BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.
Hardisty, Alex R; Bacall, Finn; Beard, Niall; Balcázar-Vargas, Maria-Paula; Balech, Bachir; Barcza, Zoltán; Bourlat, Sarah J; De Giovanni, Renato; de Jong, Yde; De Leo, Francesca; Dobor, Laura; Donvito, Giacinto; Fellows, Donal; Guerra, Antonio Fernandez; Ferreira, Nuno; Fetyukova, Yuliya; Fosso, Bruno; Giddy, Jonathan; Goble, Carole; Güntsch, Anton; Haines, Robert; Ernst, Vera Hernández; Hettling, Hannes; Hidy, Dóra; Horváth, Ferenc; Ittzés, Dóra; Ittzés, Péter; Jones, Andrew; Kottmann, Renzo; Kulawik, Robert; Leidenberger, Sonja; Lyytikäinen-Saarenmaa, Päivi; Mathew, Cherian; Morrison, Norman; Nenadic, Aleksandra; de la Hidalga, Abraham Nieva; Obst, Matthias; Oostermeijer, Gerard; Paymal, Elisabeth; Pesole, Graziano; Pinto, Salvatore; Poigné, Axel; Fernandez, Francisco Quevedo; Santamaria, Monica; Saarenmaa, Hannu; Sipos, Gergely; Sylla, Karl-Heinz; Tähtinen, Marko; Vicario, Saverio; Vos, Rutger Aldo; Williams, Alan R; Yilmaz, Pelin
2016-10-20
Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.
A virtual data language and system for scientific workflow management in data grid environments
NASA Astrophysics Data System (ADS)
Zhao, Yong
With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.
Beck, Marcus W.; Vondracek, Bruce C.; Hatch, Lorin K.; Vinje, Jason
2013-01-01
Lake resources can be negatively affected by environmental stressors originating from multiple sources and different spatial scales. Shoreline development, in particular, can negatively affect lake resources through decline in habitat quality, physical disturbance, and impacts on fisheries. The development of remote sensing techniques that efficiently characterize shoreline development in a regional context could greatly improve management approaches for protecting and restoring lake resources. The goal of this study was to develop an approach using high-resolution aerial photographs to quantify and assess docks as indicators of shoreline development. First, we describe a dock analysis workflow that can be used to quantify the spatial extent of docks using aerial images. Our approach incorporates pixel-based classifiers with object-based techniques to effectively analyze high-resolution digital imagery. Second, we apply the analysis workflow to quantify docks for 4261 lakes managed by the Minnesota Department of Natural Resources. Overall accuracy of the analysis results was 98.4% (87.7% based on ) after manual post-processing. The analysis workflow was also 74% more efficient than the time required for manual digitization of docks. These analyses have immediate relevance for resource planning in Minnesota, whereas the dock analysis workflow could be used to quantify shoreline development in other regions with comparable imagery. These data can also be used to better understand the effects of shoreline development on aquatic resources and to evaluate the effects of shoreline development relative to other stressors.
Yadage and Packtivity - analysis preservation using parametrized workflows
NASA Astrophysics Data System (ADS)
Cranmer, Kyle; Heinrich, Lukas
2017-10-01
Preserving data analyses produced by the collaborations at LHC in a parametrized fashion is crucial in order to maintain reproducibility and re-usability. We argue for a declarative description in terms of individual processing steps - “packtivities” - linked through a dynamic directed acyclic graph (DAG) and present an initial set of JSON schemas for such a description and an implementation - “yadage” - capable of executing workflows of analysis preserved via Linux containers.
Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment
Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan
2016-01-01
Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058
Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.
Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher
2012-12-01
As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-03-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-01-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945
qPortal: A platform for data-driven biomedical research.
Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven
2018-01-01
Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.
Next-Generation Climate Modeling Science Challenges for Simulation, Workflow and Analysis Systems
NASA Astrophysics Data System (ADS)
Koch, D. M.; Anantharaj, V. G.; Bader, D. C.; Krishnan, H.; Leung, L. R.; Ringler, T.; Taylor, M.; Wehner, M. F.; Williams, D. N.
2016-12-01
We will present two examples of current and future high-resolution climate-modeling research that are challenging existing simulation run-time I/O, model-data movement, storage and publishing, and analysis. In each case, we will consider lessons learned as current workflow systems are broken by these large-data science challenges, as well as strategies to repair or rebuild the systems. First we consider the science and workflow challenges to be posed by the CMIP6 multi-model HighResMIP, involving around a dozen modeling groups performing quarter-degree simulations, in 3-member ensembles for 100 years, with high-frequency (1-6 hourly) diagnostics, which is expected to generate over 4PB of data. An example of science derived from these experiments will be to study how resolution affects the ability of models to capture extreme-events such as hurricanes or atmospheric rivers. Expected methods to transfer (using parallel Globus) and analyze (using parallel "TECA" software tools) HighResMIP data for such feature-tracking by the DOE CASCADE project will be presented. A second example will be from the Accelerated Climate Modeling for Energy (ACME) project, which is currently addressing challenges involving multiple century-scale coupled high resolution (quarter-degree) climate simulations on DOE Leadership Class computers. ACME is anticipating production of over 5PB of data during the next 2 years of simulations, in order to investigate the drivers of water cycle changes, sea-level-rise, and carbon cycle evolution. The ACME workflow, from simulation to data transfer, storage, analysis and publication will be presented. Current and planned methods to accelerate the workflow, including implementing run-time diagnostics, and implementing server-side analysis to avoid moving large datasets will be presented.
Koperwhats, Martha A; Chang, Wei-Chih; Xiao, Jianguo
2002-01-01
Digital imaging technology promises efficient, economical, and fast service for patient care, but the challenges are great in the transition from film to a filmless (digital) environment. This change has a significant impact on the film library's personnel (film librarians) who play a leading roles in storage, classification, and retrieval of images. The objectives of this project were to study film library errors and the usability of a physical computerized system that could not be changed, while developing an intervention to reduce errors and test the usability of the intervention. Cognitive and human factors analysis were used to evaluate human-computer interaction. A workflow analysis was performed to understand the film and digital imaging processes. User and task analyses were applied to account for all behaviors involved in interaction with the system. A heuristic evaluation was used to probe the usability issues in the picture archiving and communication systems (PACS) modules. Simplified paper-based instructions were designed to familiarize the film librarians with the digital system. A usability survey evaluated the effectiveness of the instruction. The user and task analyses indicated that different users faced challenges based on their computer literacy, education, roles, and frequency of use of diagnostic imaging. The workflow analysis showed that the approaches to using the digital library differ among the various departments. The heuristic evaluation of the PACS modules showed the human-computer interface to have usability issues that prevented easy operation. Simplified instructions were designed for operation of the modules. Usability surveys conducted before and after revision of the instructions showed that performance improved. Cognitive and human factor analysis can help film librarians and other users adapt to the filmless system. Use of cognitive science tools will aid in successful transition of the film library from a film environment to a digital environment.
Yaniv, Ziv; Lowekamp, Bradley C; Johnson, Hans J; Beare, Richard
2018-06-01
Modern scientific endeavors increasingly require team collaborations to construct and interpret complex computational workflows. This work describes an image-analysis environment that supports the use of computational tools that facilitate reproducible research and support scientists with varying levels of software development skills. The Jupyter notebook web application is the basis of an environment that enables flexible, well-documented, and reproducible workflows via literate programming. Image-analysis software development is made accessible to scientists with varying levels of programming experience via the use of the SimpleITK toolkit, a simplified interface to the Insight Segmentation and Registration Toolkit. Additional features of the development environment include user friendly data sharing using online data repositories and a testing framework that facilitates code maintenance. SimpleITK provides a large number of examples illustrating educational and research-oriented image analysis workflows for free download from GitHub under an Apache 2.0 license: github.com/InsightSoftwareConsortium/SimpleITK-Notebooks .
Enabling Real-time Water Decision Support Services Using Model as a Service
NASA Astrophysics Data System (ADS)
Zhao, T.; Minsker, B. S.; Lee, J. S.; Salas, F. R.; Maidment, D. R.; David, C. H.
2014-12-01
Through application of computational methods and an integrated information system, data and river modeling services can help researchers and decision makers more rapidly understand river conditions under alternative scenarios. To enable this capability, workflows (i.e., analysis and model steps) are created and published as Web services delivered through an internet browser, including model inputs, a published workflow service, and visualized outputs. The RAPID model, which is a river routing model developed at University of Texas Austin for parallel computation of river discharge, has been implemented as a workflow and published as a Web application. This allows non-technical users to remotely execute the model and visualize results as a service through a simple Web interface. The model service and Web application has been prototyped in the San Antonio and Guadalupe River Basin in Texas, with input from university and agency partners. In the future, optimization model workflows will be developed to link with the RAPID model workflow to provide real-time water allocation decision support services.
Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow
Brunk, Elizabeth; George, Kevin W.; Alonso-Gutierrez, Jorge; ...
2016-05-19
Understanding the complex interactions that occur between heterologous and native biochemical pathways represents a major challenge in metabolic engineering and synthetic biology. We present a workflow that integrates metabolomics, proteomics, and genome-scale models of Escherichia coli metabolism to study the effects of introducing a heterologous pathway into a microbial host. This workflow incorporates complementary approaches from computational systems biology, metabolic engineering, and synthetic biology; provides molecular insight into how the host organism microenvironment changes due to pathway engineering; and demonstrates how biological mechanisms underlying strain variation can be exploited as an engineering strategy to increase product yield. As a proofmore » of concept, we present the analysis of eight engineered strains producing three biofuels: isopentenol, limonene, and bisabolene. Application of this workflow identified the roles of candidate genes, pathways, and biochemical reactions in observed experimental phenomena and facilitated the construction of a mutant strain with improved productivity. The contributed workflow is available as an open-source tool in the form of iPython notebooks.« less
Barriers to critical thinking: workflow interruptions and task switching among nurses.
Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin
2011-10-01
Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.
Building an efficient curation workflow for the Arabidopsis literature corpus
Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva
2012-01-01
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
Development of the workflow kine systems for support on KAIZEN.
Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro
2012-01-01
In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.
[Integration of the radiotherapy irradiation planning in the digital workflow].
Röhner, F; Schmucker, M; Henne, K; Momm, F; Bruggmoser, G; Grosu, A-L; Frommhold, H; Heinemann, F E
2013-02-01
At the Clinic of Radiotherapy at the University Hospital Freiburg, all relevant workflow is paperless. After implementing the Operating Schedule System (OSS) as a framework, all processes are being implemented into the departmental system MOSAIQ. Designing a digital workflow for radiotherapy irradiation planning is a large challenge, it requires interdisciplinary expertise and therefore the interfaces between the professions also have to be interdisciplinary. For every single step of radiotherapy irradiation planning, distinct responsibilities have to be defined and documented. All aspects of digital storage, backup and long-term availability of data were considered and have already been realized during the OSS project. After an analysis of the complete workflow and the statutory requirements, a detailed project plan was designed. In an interdisciplinary workgroup, problems were discussed and a detailed flowchart was developed. The new functionalities were implemented in a testing environment by the Clinical and Administrative IT Department (CAI). After extensive tests they were integrated into the new modular department system. The Clinic of Radiotherapy succeeded in realizing a completely digital workflow for radiotherapy irradiation planning. During the testing phase, our digital workflow was examined and afterwards was approved by the responsible authority.
Wilk, S; Siegl, L; Siegl, K; Hohenstein, C
2018-04-01
In an analysis of a critical incident reporting system (CIRS) in out-of-hospital emergency medicine, it was demonstrated that in 30% of cases deficient communication led to a threat to patients; however, the analysis did not show what exactly the most dangerous work processes are. Current research shows the impact of poor communication on patient safety. An out-of-hospital workflow analysis collects data about key work processes and risk areas. The analysis points out confounding factors for a sufficient communication. Almost 70% of critical incidents are based on human factors. Factors, such as communication and teamwork have an impact but fatigue, noise levels and illness also have a major influence. (I) CIRS database analysis The workflow analysis was based on 247 CIRS cases. This was completed by participant observation and interviews with emergency doctors and paramedics. The 247 CIRS cases displayed 282 communication incidents, which are categorized into 6 subcategories of miscommunication. One CIRS case can be classified into different categories if more communication incidents were validated by the reviewers and four experienced emergency physicians sorted these cases into six subcategories. (II) Workflow analysis The workflow analysis was carried out between 2015 and 2016 in Jena and Berlin, Germany. The focal point of research was to find accumulation of communication risks in different parts of prehospital patient care. During 30 h driving with emergency ambulances, the author interviewed 12 members of the emergency medical service of which 5 were emergency physicians and 7 paramedics. A total of 11 internal medicine cases and one automobile accident were monitored. After patient care the author asked in a 15-min interview if miscommunication or communication incidents occurred. (I) CIRS analysis Between 2005 and 2015, 845 reports were reported to the database. The experts identified 247 incident reports with communication failure. All communication aspects were analyzed and classified. We identified 282 communication incidents. (II) Workflow analysis The analysis showed three phases of prehospital patient care: 1. incoming emergency call and dispatch of ambulance service, 2. prehospital treatment, 3. transportation to a hospital. Overall, the number of incidences is increasing as a consequence of parallel workflows. Category 1 was particularly significant and predominantly, paramedics criticized that emergency physicians did not acknowledge their advice (n = 73 vs. n = 9). Category 3 with n = 63, category 4 with n = 20 and category 2 with n = 13 were the major reasons for incidents. A better interface communication helps to coordinate patient transfer and is an option for optimizing resources. Frequent training in communication is an option to avoid incidents.
RNA-Seq workflow: gene-level exploratory analysis and differential expression
Love, Michael I.; Anders, Simon; Kim, Vladislav; Huber, Wolfgang
2015-01-01
Here we walk through an end-to-end gene-level RNA-Seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results. PMID:26674615
Soto, Mauricio; Capurro, Daniel; Catalán, Silvia
2015-01-01
Electronic health records (EHRs) present an opportunity for quality improvement in health organitations, particularly at the primary health level. However, EHR implementation impacts clinical workflows, and physicians frequently prefer to document in a non-structured way, which ultimately hinders the ability to measure quality indicators. We present an assessment of data completeness-a key data quality indicator-during the first 12 months after the implementation of an EHR at a teaching outpatient center in Santiago, Chile.
A software tool to analyze clinical workflows from direct observations.
Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander
2015-01-01
Observational data of clinical processes need to be managed in a convenient way, so that process information is reliable, valid and viable for further analysis. However, existing tools for allocating observations fail in systematic data collection of specific workflow recordings. We present a software tool which was developed to facilitate the analysis of clinical process observations. The tool was successfully used in the project OntoHealth, to build, store and analyze observations of diabetes routine consultations.
Initial steps towards a production platform for DNA sequence analysis on the grid.
Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D
2010-12-14
Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/
A Community-Driven Workflow Recommendations and Reuse Infrastructure
NASA Astrophysics Data System (ADS)
Zhang, J.; Votava, P.; Lee, T. J.; Lee, C.; Xiao, S.; Nemani, R. R.; Foster, I.
2013-12-01
Aiming to connect the Earth science community to accelerate the rate of discovery, NASA Earth Exchange (NEX) has established an online repository and platform, so that researchers can publish and share their tools and models with colleagues. In recent years, workflow has become a popular technique at NEX for Earth scientists to define executable multi-step procedures for data processing and analysis. The ability to discover and reuse knowledge (sharable workflows or workflow) is critical to the future advancement of science. However, as reported in our earlier study, the reusability of scientific artifacts at current time is very low. Scientists often do not feel confident in using other researchers' tools and utilities. One major reason is that researchers are often unaware of the existence of others' data preprocessing processes. Meanwhile, researchers often do not have time to fully document the processes and expose them to others in a standard way. These issues cannot be overcome by the existing workflow search technologies used in NEX and other data projects. Therefore, this project aims to develop a proactive recommendation technology based on collective NEX user behaviors. In this way, we aim to promote and encourage process and workflow reuse within NEX. Particularly, we focus on leveraging peer scientists' best practices to support the recommendation of artifacts developed by others. Our underlying theoretical foundation is rooted in the social cognitive theory, which declares people learn by watching what others do. Our fundamental hypothesis is that sharable artifacts have network properties, much like humans in social networks. More generally, reusable artifacts form various types of social relationships (ties), and may be viewed as forming what organizational sociologists who use network analysis to study human interactions call a 'knowledge network.' In particular, we will tackle two research questions: R1: What hidden knowledge may be extracted from usage history to help Earth scientists better understand existing artifacts and how to use them in a proper manner? R2: Informed by insights derived from their computing contexts, how could such hidden knowledge be used to facilitate artifact reuse by Earth scientists? Our study of the two research questions will provide answers to three technical questions aiming to assist NEX users during workflow development: 1) How to determine what topics interest the researcher? 2) How to find appropriate artifacts? and 3) How to advise the researcher in artifact reuse? In this paper, we report our on-going efforts of leveraging social networking theory and analysis techniques to provide dynamic advice on artifact reuse to NEX users based on their surrounding contexts. As a proof of concept, we have designed and developed a plug-in to the VisTrails workflow design tool. When users develop workflows using VisTrails, our plug-in will proactively recommend most relevant sub-workflows to the users.
NASA Astrophysics Data System (ADS)
Harris, A. T.; Ramachandran, R.; Maskey, M.
2013-12-01
The Exelis-developed IDL and ENVI software are ubiquitous tools in Earth science research environments. The IDL Workbench is used by the Earth science community for programming custom data analysis and visualization modules. ENVI is a software solution for processing and analyzing geospatial imagery that combines support for multiple Earth observation scientific data types (optical, thermal, multi-spectral, hyperspectral, SAR, LiDAR) with advanced image processing and analysis algorithms. The ENVI & IDL Services Engine (ESE) is an Earth science data processing engine that allows researchers to use open standards to rapidly create, publish and deploy advanced Earth science data analytics within any existing enterprise infrastructure. Although powerful in many ways, the tools lack collaborative features out-of-box. Thus, as part of the NASA funded project, Collaborative Workbench to Accelerate Science Algorithm Development, researchers at the University of Alabama in Huntsville and Exelis have developed plugins that allow seamless research collaboration from within IDL workbench. Such additional features within IDL workbench are possible because IDL workbench is built using the Eclipse Rich Client Platform (RCP). RCP applications allow custom plugins to be dropped in for extended functionalities. Specific functionalities of the plugins include creating complex workflows based on IDL application source code, submitting workflows to be executed by ESE in the cloud, and sharing and cloning of workflows among collaborators. All these functionalities are available to scientists without leaving their IDL workbench. Because ESE can interoperate with any middleware, scientific programmers can readily string together IDL processing tasks (or tasks written in other languages like C++, Java or Python) to create complex workflows for deployment within their current enterprise architecture (e.g. ArcGIS Server, GeoServer, Apache ODE or SciFlo from JPL). Using the collaborative IDL Workbench, coupled with ESE for execution in the cloud, asynchronous workflows could be executed in batch mode on large data in the cloud. We envision that a scientist will initially develop a scientific workflow locally on a small set of data. Once tested, the scientist will deploy the workflow to the cloud for execution. Depending on the results, the scientist may share the workflow and results, allowing them to be stored in a community catalog and instantly loaded into the IDL Workbench of other scientists. Thereupon, scientists can clone and modify or execute the workflow with different input parameters. The Collaborative Workbench will provide a platform for collaboration in the cloud, helping Earth scientists solve big-data problems in the Earth and planetary sciences.
NASA Astrophysics Data System (ADS)
Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.
2007-12-01
Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*
Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing
2011-01-01
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
From the desktop to the grid: scalable bioinformatics via workflow conversion.
de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver
2016-03-12
Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling
NASA Astrophysics Data System (ADS)
Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.
2015-12-01
The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Addressing informatics challenges in Translational Research with workflow technology.
Beaulah, Simon A; Correll, Mick A; Munro, Robin E J; Sheldon, Jonathan G
2008-09-01
Interest in Translational Research has been growing rapidly in recent years. In this collision of different data, technologies and cultures lie tremendous opportunities for the advancement of science and business for organisations that are able to integrate, analyse and deliver this information effectively to users. Workflow-based integration and analysis systems are becoming recognised as a fast and flexible way to build applications that are tailored to scientific areas, yet are built on a common platform. Workflow systems are allowing organisations to meet the key informatics challenges in Translational Research and improve disease understanding and patient care.
NASA Astrophysics Data System (ADS)
Sun, Z.; Cao, Y. K.
2015-08-01
The paper focuses on the versatility of data processing workflows ranging from BIM-based survey to structural analysis and reverse modeling. In China nowadays, a large number of historic architecture are in need of restoration, reinforcement and renovation. But the architects are not prepared for the conversion from the booming AEC industry to architectural preservation. As surveyors working with architects in such projects, we have to develop efficient low-cost digital survey workflow robust to various types of architecture, and to process the captured data for architects. Although laser scanning yields high accuracy in architectural heritage documentation and the workflow is quite straightforward, the cost and portability hinder it from being used in projects where budget and efficiency are of prime concern. We integrate Structure from Motion techniques with UAV and total station in data acquisition. The captured data is processed for various purposes illustrated with three case studies: the first one is as-built BIM for a historic building based on registered point clouds according to Ground Control Points; The second one concerns structural analysis for a damaged bridge using Finite Element Analysis software; The last one relates to parametric automated feature extraction from captured point clouds for reverse modeling and fabrication.
Joda, Tim; Brägger, Urs
2015-01-01
To compare time-efficiency in the production of implant crowns using a digital workflow versus the conventional pathway. This prospective clinical study used a crossover design that included 20 study participants receiving single-tooth replacements in posterior sites. Each patient received a customized titanium abutment plus a computer-aided design/computer-assisted manufacture (CAD/CAM) zirconia suprastructure (for those in the test group, using digital workflow) and a standardized titanium abutment plus a porcelain-fused-to-metal crown (for those in the control group, using a conventional pathway). The start of the implant prosthetic treatment was established as the baseline. Time-efficiency analysis was defined as the primary outcome, and was measured for every single clinical and laboratory work step in minutes. Statistical analysis was calculated with the Wilcoxon rank sum test. All crowns could be provided within two clinical appointments, independent of the manufacturing process. The mean total production time, as the sum of clinical plus laboratory work steps, was significantly different. The mean ± standard deviation (SD) time was 185.4 ± 17.9 minutes for the digital workflow process and 223.0 ± 26.2 minutes for the conventional pathway (P = .0001). Therefore, digital processing for overall treatment was 16% faster. Detailed analysis for the clinical treatment revealed a significantly reduced mean ± SD chair time of 27.3 ± 3.4 minutes for the test group compared with 33.2 ± 4.9 minutes for the control group (P = .0001). Similar results were found for the mean laboratory work time, with a significant decrease of 158.1 ± 17.2 minutes for the test group vs 189.8 ± 25.3 minutes for the control group (P = .0001). Only a few studies have investigated efficiency parameters of digital workflows compared with conventional pathways in implant dental medicine. This investigation shows that the digital workflow seems to be more time-efficient than the established conventional production pathway for fixed implant-supported crowns. Both clinical chair time and laboratory manufacturing steps could be effectively shortened with the digital process of intraoral scanning plus CAD/CAM technology.
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2016-01-30
Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
KDE Bioscience: platform for bioinformatics analysis workflows.
Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue
2006-08-01
Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
pyNSMC: A Python Module for Null-Space Monte Carlo Uncertainty Analysis
NASA Astrophysics Data System (ADS)
White, J.; Brakefield, L. K.
2015-12-01
The null-space monte carlo technique is a non-linear uncertainty analyses technique that is well-suited to high-dimensional inverse problems. While the technique is powerful, the existing workflow for completing null-space monte carlo is cumbersome, requiring the use of multiple commandline utilities, several sets of intermediate files and even a text editor. pyNSMC is an open-source python module that automates the workflow of null-space monte carlo uncertainty analyses. The module is fully compatible with the PEST and PEST++ software suites and leverages existing functionality of pyEMU, a python framework for linear-based uncertainty analyses. pyNSMC greatly simplifies the existing workflow for null-space monte carlo by taking advantage of object oriented design facilities in python. The core of pyNSMC is the ensemble class, which draws and stores realized random vectors and also provides functionality for exporting and visualizing results. By relieving users of the tedium associated with file handling and command line utility execution, pyNSMC instead focuses the user on the important steps and assumptions of null-space monte carlo analysis. Furthermore, pyNSMC facilitates learning through flow charts and results visualization, which are available at many points in the algorithm. The ease-of-use of the pyNSMC workflow is compared to the existing workflow for null-space monte carlo for a synthetic groundwater model with hundreds of estimable parameters.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.
Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Challenges to EHR implementation in electronic- versus paper-based office practices.
Zandieh, Stephanie O; Yoon-Flannery, Kahyun; Kuperman, Gilad J; Langsam, Daniel J; Hyman, Daniel; Kaushal, Rainu
2008-06-01
Challenges in implementing electronic health records (EHRs) have received some attention, but less is known about the process of transitioning from legacy EHRs to newer systems. To determine how ambulatory leaders differentiate implementation approaches between practices that are currently paper-based and those with a legacy EHR system (EHR-based). Qualitative study. Eleven practice managers and 12 medical directors all part of an academic ambulatory care network of a large teaching hospital in New York City in January to May of 2006. Qualitative approach comparing and contrasting perceived benefits and challenges in implementing an ambulatory EHR between practice leaders from paper- and EHR-based practices. Content analysis was performed using grounded theory and ATLAS.ti 5.0. We found that paper-based leaders prioritized the following: sufficient workstations and printers, a physician information technology (IT) champion at the practice, workflow education to ensure a successful transition to a paperless medical practice, and a high existing comfort level of practitioners and support staff with IT. In contrast, EHR-based leaders prioritized: improved technical training and ongoing technical support, sufficient protection of patient privacy, and open recognition of physician resistance, especially for those who were loyal to a legacy EHR. Unlike paper-based practices, EHR-based leadership believed that comfort level with IT and adjustments to workflow changes would not be difficult challenges to overcome. Leadership at paper- and EHR-based practices in 1 academic network has different priorities for implementing a new EHR. Ambulatory practices upgrading their legacy EHR have unique challenges.
Repeat rates in digital chest radiography and strategies for improvement.
Fintelmann, Florian; Pulli, Benjamin; Abedi-Tari, Faezeh; Trombley, Maureen; Shore, Mary-Theresa; Shepard, Jo-Anne; Rosenthal, Daniel I
2012-05-01
To determine the repeat rate (RR) of chest radiographs acquired with portable computed radiography (CR) and installed direct radiography (DR) and to develop and assess strategies designed to decrease the RR. The RR and reasons for repeated digital chest radiographs were documented over the course of 16 months while a task force of thoracic radiologists, technologist supervisors, technologists, and information technology specialists continued to examine the workflow for underlying causes. Interventions decreasing the RR were designed and implemented. The initial RR of digital chest radiographs was 3.6% (138/3818) for portable CR and 13.3% (476/3575) for installed DR systems. By combining RR measurement with workflow analysis, targets for technical and teaching interventions were identified. The interventions decreased the RR to 1.8% (81/4476) for portable CR and to 8.2% (306/3748) for installed DR. We found the RR of direct digital chest radiography to be significantly higher than that of computed chest radiography. We believe this is due to the ease with which repeat images can be obtained and discarded, and it suggests the need for ongoing surveillance of RR. We were able to demonstrate that strategies to lower the RR, which had been developed in the era of film-based imaging, can be adapted to the digital environment. On the basis of our findings, we encourage radiologists to assess their own departmental RRs for direct digital chest radiography and to consider similar interventions if necessary to achieve acceptable RRs for this modality.
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreher, M.; Peterka, T.
Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
Structured recording of intraoperative surgical workflows
NASA Astrophysics Data System (ADS)
Neumuth, T.; Durstewitz, N.; Fischer, M.; Strauss, G.; Dietz, A.; Meixensberger, J.; Jannin, P.; Cleary, K.; Lemke, H. U.; Burgert, O.
2006-03-01
Surgical Workflows are used for the methodical and scientific analysis of surgical interventions. The approach described here is a step towards developing surgical assist systems based on Surgical Workflows and integrated control systems for the operating room of the future. This paper describes concepts and technologies for the acquisition of Surgical Workflows by monitoring surgical interventions and their presentation. Establishing systems which support the Surgical Workflow in operating rooms requires a multi-staged development process beginning with the description of these workflows. A formalized description of surgical interventions is needed to create a Surgical Workflow. This description can be used to analyze and evaluate surgical interventions in detail. We discuss the subdivision of surgical interventions into work steps regarding different levels of granularity and propose a recording scheme for the acquisition of manual surgical work steps from running interventions. To support the recording process during the intervention, we introduce a new software architecture. Core of the architecture is our Surgical Workflow editor that is intended to deal with the manifold, complex and concurrent relations during an intervention. Furthermore, a method for an automatic generation of graphs is shown which is able to display the recorded surgical work steps of the interventions. Finally we conclude with considerations about extensions of our recording scheme to close the gap to S-PACS systems. The approach was used to record 83 surgical interventions from 6 intervention types from 3 different surgical disciplines: ENT surgery, neurosurgery and interventional radiology. The interventions were recorded at the University Hospital Leipzig, Germany and at the Georgetown University Hospital, Washington, D.C., USA.
SU-E-T-419: Workflow and FMEA in a New Proton Therapy (PT) Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheng, C; Wessels, B; Hamilton, H
2014-06-01
Purpose: Workflow is an important component in the operational planning of a new proton facility. By integrating the concept of failure mode and effect analysis (FMEA) and traditional QA requirements, a workflow for a proton therapy treatment course is set up. This workflow serves as the blue print for the planning of computer hardware/software requirements and network flow. A slight modification of the workflow generates a process map(PM) for FMEA and the planning of QA program in PT. Methods: A flowchart is first developed outlining the sequence of processes involved in a PT treatment course. Each process consists of amore » number of sub-processes to encompass a broad scope of treatment and QA procedures. For each subprocess, the personnel involved, the equipment needed and the computer hardware/software as well as network requirements are defined by a team of clinical staff, administrators and IT personnel. Results: Eleven intermediate processes with a total of 70 sub-processes involved in a PT treatment course are identified. The number of sub-processes varies, ranging from 2-12. The sub-processes within each process are used for the operational planning. For example, in the CT-Sim process, there are 12 sub-processes: three involve data entry/retrieval from a record-and-verify system, two controlled by the CT computer, two require department/hospital network, and the other five are setup procedures. IT then decides the number of computers needed and the software and network requirement. By removing the traditional QA procedures from the workflow, a PM is generated for FMEA analysis to design a QA program for PT. Conclusion: Significant efforts are involved in the development of the workflow in a PT treatment course. Our hybrid model of combining FMEA and traditional QA program serves a duo purpose of efficient operational planning and designing of a QA program in PT.« less
Hartman, Amber L; Riddle, Sean; McPhillips, Timothy; Ludäscher, Bertram; Eisen, Jonathan A
2010-06-12
For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.
MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data
2014-01-01
Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103
2008-07-01
Mass Spectrometry Data in Workflows for the Discovery of Biomarkets in Breast Cancer PRINCIPAL INVESTIGATOR: Vladimir Fokin, Ph.D... Biomarkets in Breast Cancer 5b. GRANT NUMBER W81XWH-07-1-0447 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER Vladimir Fokin, Ph.D
NASA Astrophysics Data System (ADS)
Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut
2017-04-01
Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).
An integrated workflow for analysis of ChIP-chip data.
Weigelt, Karin; Moehle, Christoph; Stempfl, Thomas; Weber, Bernhard; Langmann, Thomas
2008-08-01
Although ChIP-chip is a powerful tool for genome-wide discovery of transcription factor target genes, the steps involving raw data analysis, identification of promoters, and correlation with binding sites are still laborious processes. Therefore, we report an integrated workflow for the analysis of promoter tiling arrays with the Genomatix ChipInspector system. We compare this tool with open-source software packages to identify PU.1 regulated genes in mouse macrophages. Our results suggest that ChipInspector data analysis, comparative genomics for binding site prediction, and pathway/network modeling significantly facilitate and enhance whole-genome promoter profiling to reveal in vivo sites of transcription factor-DNA interactions.
Lee, Matthew H; Schemmel, Andrew J; Pooler, B Dustin; Hanley, Taylor; Kennedy, Tabassum; Field, Aaron; Wiegmann, Douglas; Yu, John-Paul J
2017-04-01
The study aimed to assess perceptions of reading room workflow and the impact separating image-interpretive and nonimage-interpretive task workflows can have on radiologist perceptions of workplace disruptions, workload, and overall satisfaction. A 14-question survey instrument was developed to measure radiologist perceptions of workplace interruptions, satisfaction, and workload prior to and following implementation of separate image-interpretive and nonimage-interpretive reading room workflows. The results were collected over 2 weeks preceding the intervention and 2 weeks following the end of the intervention. The results were anonymized and analyzed using univariate analysis. A total of 18 people responded to the preintervention survey: 6 neuroradiology fellows and 12 attending neuroradiologists. Fifteen people who were then present for the 1-month intervention period responded to the postintervention survey. Perceptions of workplace disruptions, image interpretation, quality of trainee education, ability to perform nonimage-interpretive tasks, and quality of consultations (P < 0.0001) all improved following the intervention. Mental effort and workload also improved across all assessment domains, as did satisfaction with quality of image interpretation and consultative work. Implementation of parallel dedicated image-interpretive and nonimage-interpretive workflows may improve markers of radiologist perceptions of workplace satisfaction. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
You and me and the computer makes three: variations in exam room use of the electronic health record
Saleem, Jason J; Flanagan, Mindy E; Russ, Alissa L; McMullen, Carmit K; Elli, Leora; Russell, Scott A; Bennett, Katelyn J; Matthias, Marianne S; Rehman, Shakaib U; Schwartz, Mark D; Frankel, Richard M
2014-01-01
Challenges persist on how to effectively integrate the electronic health record (EHR) into patient visits and clinical workflow, while maintaining patient-centered care. Our goal was to identify variations in, barriers to, and facilitators of the use of the US Department of Veterans Affairs (VA) EHR in ambulatory care workflow in order better to understand how to integrate the EHR into clinical work. We observed and interviewed 20 ambulatory care providers across three geographically distinct VA medical centers. Analysis revealed several variations in, associated barriers to, and facilitators of EHR use corresponding to different units of analysis: computer interface, team coordination/workflow, and organizational. We discuss our findings in the context of different units of analysis and connect variations in EHR use to various barriers and facilitators. Findings from this study may help inform the design of the next generation of EHRs for the VA and other healthcare systems. PMID:24001517
Kinyua, Juliet; Negreira, Noelia; Ibáñez, María; Bijlsma, Lubertus; Hernández, Félix; Covaci, Adrian; van Nuijs, Alexander L N
2015-11-01
Identification of new psychoactive substances (NPS) is challenging. Developing targeted methods for their analysis can be difficult and costly due to their impermanence on the drug scene. Accurate-mass mass spectrometry (AMMS) using a quadrupole time-of-flight (QTOF) analyzer can be useful for wide-scope screening since it provides sensitive, full-spectrum MS data. Our article presents a qualitative screening workflow based on data-independent acquisition mode (all-ions MS/MS) on liquid chromatography (LC) coupled to QTOFMS for the detection and identification of NPS in biological matrices. The workflow combines and structures fundamentals of target and suspect screening data processing techniques in a structured algorithm. This allows the detection and tentative identification of NPS and their metabolites. We have applied the workflow to two actual case studies involving drug intoxications where we detected and confirmed the parent compounds ketamine, 25B-NBOMe, 25C-NBOMe, and several predicted phase I and II metabolites not previously reported in urine and serum samples. The screening workflow demonstrates the added value for the detection and identification of NPS in biological matrices.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data
NASA Astrophysics Data System (ADS)
Peterson, F. S.; Barbieri, L.; Wyngaard, J.
2015-12-01
Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.
Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert
2017-05-15
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.
Handler, Michael; Schier, Peter P; Fritscher, Karl D; Raudaschl, Patrik; Johnson Chacko, Lejo; Glueckert, Rudolf; Saba, Rami; Schubert, Rainer; Baumgarten, Daniel; Baumgartner, Christian
2017-01-01
Our sense of balance and spatial orientation strongly depends on the correct functionality of our vestibular system. Vestibular dysfunction can lead to blurred vision and impaired balance and spatial orientation, causing a significant decrease in quality of life. Recent studies have shown that vestibular implants offer a possible treatment for patients with vestibular dysfunction. The close proximity of the vestibular nerve bundles, the facial nerve and the cochlear nerve poses a major challenge to targeted stimulation of the vestibular system. Modeling the electrical stimulation of the vestibular system allows for an efficient analysis of stimulation scenarios previous to time and cost intensive in vivo experiments. Current models are based on animal data or CAD models of human anatomy. In this work, a (semi-)automatic modular workflow is presented for the stepwise transformation of segmented vestibular anatomy data of human vestibular specimens to an electrical model and subsequently analyzed. The steps of this workflow include (i) the transformation of labeled datasets to a tetrahedra mesh, (ii) nerve fiber anisotropy and fiber computation as a basis for neuron models, (iii) inclusion of arbitrary electrode designs, (iv) simulation of quasistationary potential distributions, and (v) analysis of stimulus waveforms on the stimulation outcome. Results obtained by the workflow based on human datasets and the average shape of a statistical model revealed a high qualitative agreement and a quantitatively comparable range compared to data from literature, respectively. Based on our workflow, a detailed analysis of intra- and extra-labyrinthine electrode configurations with various stimulation waveforms and electrode designs can be performed on patient specific anatomy, making this framework a valuable tool for current optimization questions concerning vestibular implants in humans.
Stockwell, Simon R; Mittnacht, Sibylle
2014-12-16
Advances in understanding the control mechanisms governing the behavior of cells in adherent mammalian tissue culture models are becoming increasingly dependent on modes of single-cell analysis. Methods which deliver composite data reflecting the mean values of biomarkers from cell populations risk losing subpopulation dynamics that reflect the heterogeneity of the studied biological system. In keeping with this, traditional approaches are being replaced by, or supported with, more sophisticated forms of cellular assay developed to allow assessment by high-content microscopy. These assays potentially generate large numbers of images of fluorescent biomarkers, which enabled by accompanying proprietary software packages, allows for multi-parametric measurements per cell. However, the relatively high capital costs and overspecialization of many of these devices have prevented their accessibility to many investigators. Described here is a universally applicable workflow for the quantification of multiple fluorescent marker intensities from specific subcellular regions of individual cells suitable for use with images from most fluorescent microscopes. Key to this workflow is the implementation of the freely available Cell Profiler software(1) to distinguish individual cells in these images, segment them into defined subcellular regions and deliver fluorescence marker intensity values specific to these regions. The extraction of individual cell intensity values from image data is the central purpose of this workflow and will be illustrated with the analysis of control data from a siRNA screen for G1 checkpoint regulators in adherent human cells. However, the workflow presented here can be applied to analysis of data from other means of cell perturbation (e.g., compound screens) and other forms of fluorescence based cellular markers and thus should be useful for a wide range of laboratories.
Predict-first experimental analysis using automated and integrated magnetohydrodynamic modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lyons, B. C.; Paz-Soldan, C.; Meneghini, O.
An integrated-modeling workflow has been developed in this paper for the purpose of performing predict-first analysis of transient-stability experiments. Starting from an existing equilibrium reconstruction from a past experiment, the workflow couples together the EFIT Grad-Shafranov solver [L. Lao et al., Fusion Sci. Technol. 48, 968 (2005)], the EPED model for the pedestal structure [P. B. Snyder et al., Phys. Plasmas 16, 056118 (2009)], and the NEO drift-kinetic-equation solver [E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 54, 015015 (2012)] (for bootstrap current calculations) in order to generate equilibria with self-consistent pedestal structures as the plasma shape andmore » various scalar parameters (e.g., normalized β, pedestal density, and edge safety factor [q 95]) are changed. These equilibria are then analyzed using automated M3D-C1 extended-magnetohydrodynamic modeling [S. C. Jardin et al., Comput. Sci. Discovery 5, 014002 (2012)] to compute the plasma response to three-dimensional magnetic perturbations. This workflow was created in conjunction with a DIII-D experiment examining the effect of triangularity on the 3D plasma response. Several versions of the workflow were developed, and the initial ones were used to help guide experimental planning (e.g., determining the plasma current necessary to maintain the constant edge safety factor in various shapes). Subsequent validation with the experimental results was then used to revise the workflow, ultimately resulting in the complete model presented here. We show that quantitative agreement was achieved between the M3D-C1 plasma response calculated for equilibria generated by the final workflow and equilibria reconstructed from experimental data. A comparison of results from earlier workflows is used to show the importance of properly matching certain experimental parameters in the generated equilibria, including the normalized β, pedestal density, and q 95. On the other hand, the details of the pedestal current did not significantly impact the plasma response in these equilibria. A comparison to the experimentally measured plasma response shows mixed agreement, indicating that while the equilibria are predicted well, additional analysis tools may be needed. In conclusion, we note the implications that these results have for the success of future predict-first studies, particularly the need for scans of uncertain parameters and for close collaboration between experimentalists and theorists.« less
Predict-first experimental analysis using automated and integrated magnetohydrodynamic modeling
Lyons, B. C.; Paz-Soldan, C.; Meneghini, O.; ...
2018-05-07
An integrated-modeling workflow has been developed in this paper for the purpose of performing predict-first analysis of transient-stability experiments. Starting from an existing equilibrium reconstruction from a past experiment, the workflow couples together the EFIT Grad-Shafranov solver [L. Lao et al., Fusion Sci. Technol. 48, 968 (2005)], the EPED model for the pedestal structure [P. B. Snyder et al., Phys. Plasmas 16, 056118 (2009)], and the NEO drift-kinetic-equation solver [E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 54, 015015 (2012)] (for bootstrap current calculations) in order to generate equilibria with self-consistent pedestal structures as the plasma shape andmore » various scalar parameters (e.g., normalized β, pedestal density, and edge safety factor [q 95]) are changed. These equilibria are then analyzed using automated M3D-C1 extended-magnetohydrodynamic modeling [S. C. Jardin et al., Comput. Sci. Discovery 5, 014002 (2012)] to compute the plasma response to three-dimensional magnetic perturbations. This workflow was created in conjunction with a DIII-D experiment examining the effect of triangularity on the 3D plasma response. Several versions of the workflow were developed, and the initial ones were used to help guide experimental planning (e.g., determining the plasma current necessary to maintain the constant edge safety factor in various shapes). Subsequent validation with the experimental results was then used to revise the workflow, ultimately resulting in the complete model presented here. We show that quantitative agreement was achieved between the M3D-C1 plasma response calculated for equilibria generated by the final workflow and equilibria reconstructed from experimental data. A comparison of results from earlier workflows is used to show the importance of properly matching certain experimental parameters in the generated equilibria, including the normalized β, pedestal density, and q 95. On the other hand, the details of the pedestal current did not significantly impact the plasma response in these equilibria. A comparison to the experimentally measured plasma response shows mixed agreement, indicating that while the equilibria are predicted well, additional analysis tools may be needed. In conclusion, we note the implications that these results have for the success of future predict-first studies, particularly the need for scans of uncertain parameters and for close collaboration between experimentalists and theorists.« less
Predict-first experimental analysis using automated and integrated magnetohydrodynamic modeling
NASA Astrophysics Data System (ADS)
Lyons, B. C.; Paz-Soldan, C.; Meneghini, O.; Lao, L. L.; Weisberg, D. B.; Belli, E. A.; Evans, T. E.; Ferraro, N. M.; Snyder, P. B.
2018-05-01
An integrated-modeling workflow has been developed for the purpose of performing predict-first analysis of transient-stability experiments. Starting from an existing equilibrium reconstruction from a past experiment, the workflow couples together the EFIT Grad-Shafranov solver [L. Lao et al., Fusion Sci. Technol. 48, 968 (2005)], the EPED model for the pedestal structure [P. B. Snyder et al., Phys. Plasmas 16, 056118 (2009)], and the NEO drift-kinetic-equation solver [E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 54, 015015 (2012)] (for bootstrap current calculations) in order to generate equilibria with self-consistent pedestal structures as the plasma shape and various scalar parameters (e.g., normalized β, pedestal density, and edge safety factor [q95]) are changed. These equilibria are then analyzed using automated M3D-C1 extended-magnetohydrodynamic modeling [S. C. Jardin et al., Comput. Sci. Discovery 5, 014002 (2012)] to compute the plasma response to three-dimensional magnetic perturbations. This workflow was created in conjunction with a DIII-D experiment examining the effect of triangularity on the 3D plasma response. Several versions of the workflow were developed, and the initial ones were used to help guide experimental planning (e.g., determining the plasma current necessary to maintain the constant edge safety factor in various shapes). Subsequent validation with the experimental results was then used to revise the workflow, ultimately resulting in the complete model presented here. We show that quantitative agreement was achieved between the M3D-C1 plasma response calculated for equilibria generated by the final workflow and equilibria reconstructed from experimental data. A comparison of results from earlier workflows is used to show the importance of properly matching certain experimental parameters in the generated equilibria, including the normalized β, pedestal density, and q95. On the other hand, the details of the pedestal current did not significantly impact the plasma response in these equilibria. A comparison to the experimentally measured plasma response shows mixed agreement, indicating that while the equilibria are predicted well, additional analysis tools may be needed. Finally, we note the implications that these results have for the success of future predict-first studies, particularly the need for scans of uncertain parameters and for close collaboration between experimentalists and theorists.
Cai, Bin; Altman, Michael B; Garcia-Ramirez, Jose; LaBrash, Jason; Goddu, S Murty; Mutic, Sasa; Parikh, Parag J; Olsen, Jeffrey R; Saad, Nael; Zoberi, Jacqueline E
To develop a safe and robust workflow for yttrium-90 (Y-90) radioembolization procedures in a multidisciplinary team environment. A generalized Define-Measure-Analyze-Improve-Control (DMAIC)-based approach to process improvement was applied to a Y-90 radioembolization workflow. In the first DMAIC cycle, events with the Y-90 workflow were defined and analyzed. To improve the workflow, a web-based interactive electronic white board (EWB) system was adopted as the central communication platform and information processing hub. The EWB-based Y-90 workflow then underwent a second DMAIC cycle. Out of 245 treatments, three misses that went undetected until treatment initiation were recorded over a period of 21 months, and root-cause-analysis was performed to determine causes of each incident and opportunities for improvement. The EWB-based Y-90 process was further improved via new rules to define reliable sources of information as inputs into the planning process, as well as new check points to ensure this information was communicated correctly throughout the process flow. After implementation of the revised EWB-based Y-90 workflow, after two DMAIC-like cycles, there were zero misses out of 153 patient treatments in 1 year. The DMAIC-based approach adopted here allowed the iterative development of a robust workflow to achieve an adaptable, event-minimizing planning process despite a complex setting which requires the participation of multiple teams for Y-90 microspheres therapy. Implementation of such a workflow using the EWB or similar platform with a DMAIC-based process improvement approach could be expanded to other treatment procedures, especially those requiring multidisciplinary management. Copyright © 2016 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.
Digital vs. conventional implant prosthetic workflows: a cost/time analysis.
Joda, Tim; Brägger, Urs
2015-12-01
The aim of this prospective cohort trial was to perform a cost/time analysis for implant-supported single-unit reconstructions in the digital workflow compared to the conventional pathway. A total of 20 patients were included for rehabilitation with 2 × 20 implant crowns in a crossover study design and treated consecutively each with customized titanium abutments plus CAD/CAM-zirconia-suprastructures (test: digital) and with standardized titanium abutments plus PFM-crowns (control conventional). Starting with prosthetic treatment, analysis was estimated for clinical and laboratory work steps including measure of costs in Swiss Francs (CHF), productivity rates and cost minimization for first-line therapy. Statistical calculations were performed with Wilcoxon signed-rank test. Both protocols worked successfully for all test and control reconstructions. Direct treatment costs were significantly lower for the digital workflow 1815.35 CHF compared to the conventional pathway 2119.65 CHF [P = 0.0004]. For subprocess evaluation, total laboratory costs were calculated as 941.95 CHF for the test group and 1245.65 CHF for the control group, respectively [P = 0.003]. The clinical dental productivity rate amounted to 29.64 CHF/min (digital) and 24.37 CHF/min (conventional) [P = 0.002]. Overall, cost minimization analysis exhibited an 18% cost reduction within the digital process. The digital workflow was more efficient than the established conventional pathway for implant-supported crowns in this investigation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
It's All About the Data: Workflow Systems and Weather
NASA Astrophysics Data System (ADS)
Plale, B.
2009-05-01
Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Chase Qishi; Zhu, Michelle Mengxia
The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.
Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor
2016-01-01
In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
Clustering in analytical chemistry.
Drab, Klaudia; Daszykowski, Michal
2014-01-01
Data clustering plays an important role in the exploratory analysis of analytical data, and the use of clustering methods has been acknowledged in different fields of science. In this paper, principles of data clustering are presented with a direct focus on clustering of analytical data. The role of the clustering process in the analytical workflow is underlined, and its potential impact on the analytical workflow is emphasized.
PATHA: Performance Analysis Tool for HPC Applications
Yoo, Wucherl; Koo, Michelle; Cao, Yi; ...
2016-02-18
Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. In order to address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data.more » Furthermore, it utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). This study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.« less
Li, Jian; Batcha, Aarif Mohamed Nazeer; Grüning, Björn; Mansmann, Ulrich R.
2015-01-01
Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility. PMID:27081306
Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
Cheng, Gong; Zhang, Guocai; Xu, Liang
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Towards unsupervised polyaromatic hydrocarbons structural assignment from SA-TIMS-FTMS data.
Benigni, Paolo; Marin, Rebecca; Fernandez-Lima, Francisco
2015-10-01
With the advent of high resolution ion mobility analyzers and their coupling to ultrahigh resolution mass spectrometers, there is a need to further develop a theoretical workflow capable of correlating experimental accurate mass and mobility measurements with tridimensional candidate structures. In the present work, a general workflow is described for unsupervised tridimensional structural assignment based on accurate mass measurements, mobility measurements, in silico 2D-3D structure generation, and theoretical mobility calculations. In particular, the potential of this workflow will be shown for the analysis of polyaromatic hydrocarbons from Coal Tar SRM 1597a using selected accumulation - trapped ion mobility spectrometry (SA-TIMS) coupled to Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR MS). The proposed workflow can be adapted to different IMS scenarios, can utilize different collisional cross-section calculators and has the potential to include MS n and IMS n measurements for faster and more accurate tridimensional structural assignment.
SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog
NASA Astrophysics Data System (ADS)
Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely
2014-05-01
Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.
NASA Astrophysics Data System (ADS)
Terêncio, D. P. S.; Sanches Fernandes, L. F.; Cortes, R. M. V.; Pacheco, F. A. L.
2017-07-01
This study introduces an improved rainwater harvesting (RWH) suitability model to help the implementation of agro-forestry projects (irrigation, wildfire combat) in catchments. The model combines a planning workflow to define suitability of catchments based on physical, socio-economic and ecologic variables, with an allocation workflow to constrain suitable RWH sites as function of project specific features (e.g., distance from rainfall collection to application area). The planning workflow comprises a Multi Criteria Analysis (MCA) implemented on a Geographic Information System (GIS), whereas the allocation workflow is based on a multiple-parameter ranking analysis. When compared to other similar models, improvement comes with the flexible weights of MCA and the entire allocation workflow. The method is tested in a contaminated watershed (the Ave River basin) located in Portugal. The pilot project encompasses the irrigation of a 400 ha crop land that consumes 2.69 Mm3 of water per year. The application of harvested water in the irrigation replaces the use of stream water with excessive anthropogenic nutrients that may raise nitrosamines in the food and accumulation in the food chain, with severe consequences to human health (cancer). The selected rainfall collection catchment is capable to harvest 12 Mm3·yr-1 (≈ 4.5 × the requirement) and is roughly 3 km far from the application area assuring crop irrigation by gravity flow with modest transport costs. The RWH system is an 8-meter high that can be built in earth with reduced costs.
A pattern-based analysis of clinical computer-interpretable guideline modeling languages.
Mulyar, Nataliya; van der Aalst, Wil M P; Peleg, Mor
2007-01-01
Languages used to specify computer-interpretable guidelines (CIGs) differ in their approaches to addressing particular modeling challenges. The main goals of this article are: (1) to examine the expressive power of CIG modeling languages, and (2) to define the differences, from the control-flow perspective, between process languages in workflow management systems and modeling languages used to design clinical guidelines. The pattern-based analysis was applied to guideline modeling languages Asbru, EON, GLIF, and PROforma. We focused on control-flow and left other perspectives out of consideration. We evaluated the selected CIG modeling languages and identified their degree of support of 43 control-flow patterns. We used a set of explicitly defined evaluation criteria to determine whether each pattern is supported directly, indirectly, or not at all. PROforma offers direct support for 22 of 43 patterns, Asbru 20, GLIF 17, and EON 11. All four directly support basic control-flow patterns, cancellation patterns, and some advance branching and synchronization patterns. None support multiple instances patterns. They offer varying levels of support for synchronizing merge patterns and state-based patterns. Some support a few scenarios not covered by the 43 control-flow patterns. CIG modeling languages are remarkably close to traditional workflow languages from the control-flow perspective, but cover many fewer workflow patterns. CIG languages offer some flexibility that supports modeling of complex decisions and provide ways for modeling some decisions not covered by workflow management systems. Workflow management systems may be suitable for clinical guideline applications.
Integrative analysis workflow for the structural and functional classification of C-type lectins
2011-01-01
Background It is important to understand the roles of C-type lectins in the immune system due to their ubiquity and diverse range of functions in animal cells. It has been observed that currently confirmed C-type lectins share a highly conserved domain known as the C-type carbohydrate recognition domain (CRD). Using the sequence profile of the CRD, an increasing number of putative C-type lectins have been identified. Hence, it is highly needed to develop a systematic framework that enables us to elucidate their carbohydrate (glycan) recognition function, and discover their physiological and pathological roles. Results Presented herein is an integrated workflow for characterizing the sequence and structural features of novel C-type lectins. Our workflow utilizes web-based queries and available software suites to annotate features that can be found on the C-type lectin, given its amino acid sequence. At the same time, it incorporates modeling and analysis of glycans - a major class of ligands that interact with C-type lectins. Thereafter, the results are analyzed together with context-specific knowledge to filter off unlikely predictions. This allows researchers to design their subsequent experiments to confirm the functions of the C-type lectins in a systematic manner. Conclusions The efficacy and usefulness of our proposed immunoinformatics workflow was demonstrated by applying our integrated workflow to a novel C-type lectin -CLEC17A - and we report some of its possible functions that warrants further validation through wet-lab experiments. PMID:22372988
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology
Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
Veit, Johannes; Sachsenberg, Timo; Chernev, Aleksandar; Aicheler, Fabian; Urlaub, Henning; Kohlbacher, Oliver
2016-09-02
Modern mass spectrometry setups used in today's proteomics studies generate vast amounts of raw data, calling for highly efficient data processing and analysis tools. Software for analyzing these data is either monolithic (easy to use, but sometimes too rigid) or workflow-driven (easy to customize, but sometimes complex). Thermo Proteome Discoverer (PD) is a powerful software for workflow-driven data analysis in proteomics which, in our eyes, achieves a good trade-off between flexibility and usability. Here, we present two open-source plugins for PD providing additional functionality: LFQProfiler for label-free quantification of peptides and proteins, and RNP(xl) for UV-induced peptide-RNA cross-linking data analysis. LFQProfiler interacts with existing PD nodes for peptide identification and validation and takes care of the entire quantitative part of the workflow. We show that it performs at least on par with other state-of-the-art software solutions for label-free quantification in a recently published benchmark ( Ramus, C.; J. Proteomics 2016 , 132 , 51 - 62 ). The second workflow, RNP(xl), represents the first software solution to date for identification of peptide-RNA cross-links including automatic localization of the cross-links at amino acid resolution and localization scoring. It comes with a customized integrated cross-link fragment spectrum viewer for convenient manual inspection and validation of the results.
MetaboLyzer: A Novel Statistical Workflow for Analyzing Post-Processed LC/MS Metabolomics Data
Mak, Tytus D.; Laiakis, Evagelia C.; Goudarzi, Maryam; Fornace, Albert J.
2014-01-01
Metabolomics, the global study of small molecules in a particular system, has in the last few years risen to become a primary –omics platform for the study of metabolic processes. With the ever-increasing pool of quantitative data yielded from metabolomic research, specialized methods and tools with which to analyze and extract meaningful conclusions from these data are becoming more and more crucial. Furthermore, the depth of knowledge and expertise required to undertake a metabolomics oriented study is a daunting obstacle to investigators new to the field. As such, we have created a new statistical analysis workflow, MetaboLyzer, which aims to both simplify analysis for investigators new to metabolomics, as well as provide experienced investigators the flexibility to conduct sophisticated analysis. MetaboLyzer’s workflow is specifically tailored to the unique characteristics and idiosyncrasies of postprocessed liquid chromatography/mass spectrometry (LC/MS) based metabolomic datasets. It utilizes a wide gamut of statistical tests, procedures, and methodologies that belong to classical biostatistics, as well as several novel statistical techniques that we have developed specifically for metabolomics data. Furthermore, MetaboLyzer conducts rapid putative ion identification and putative biologically relevant analysis via incorporation of four major small molecule databases: KEGG, HMDB, Lipid Maps, and BioCyc. MetaboLyzer incorporates these aspects into a comprehensive workflow that outputs easy to understand statistically significant and potentially biologically relevant information in the form of heatmaps, volcano plots, 3D visualization plots, correlation maps, and metabolic pathway hit histograms. For demonstration purposes, a urine metabolomics data set from a previously reported radiobiology study in which samples were collected from mice exposed to gamma radiation was analyzed. MetaboLyzer was able to identify 243 statistically significant ions out of a total of 1942. Numerous putative metabolites and pathways were found to be biologically significant from the putative ion identification workflow. PMID:24266674
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.
Aranguren, Mikel Egaña; Wilkinson, Mark D
2015-01-01
Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services. This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration. The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
NASA Astrophysics Data System (ADS)
Hwang, Darryl H.; Ma, Kevin; Yepes, Fernando; Nadamuni, Mridula; Nayyar, Megha; Liu, Brent; Duddalwar, Vinay; Lepore, Natasha
2015-12-01
A conventional radiology report primarily consists of a large amount of unstructured text, and lacks clear, concise, consistent and content-rich information. Hence, an area of unmet clinical need consists of developing better ways to communicate radiology findings and information specific to each patient. Here, we design a new workflow and reporting system that combines and integrates advances in engineering technology with those from the medical sciences, the Multidimensional Interactive Radiology Report and Analysis (MIRRA). Until recently, clinical standards have primarily relied on 2D images for the purpose of measurement, but with the advent of 3D processing, many of the manually measured metrics can be automated, leading to better reproducibility and less subjective measurement placement. Hence, we make use this newly available 3D processing in our workflow. Our pipeline is used here to standardize the labeling, tracking, and quantifying of metrics for renal masses.
Biological Web Service Repositories Review
Urdidiales‐Nieto, David; Navas‐Delgado, Ismael
2016-01-01
Abstract Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re‐execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re‐check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well‐known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories. PMID:27783459
Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R
2017-11-10
RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Genetic analysis of circulating tumor cells in pancreatic cancer patients: A pilot study.
Görner, Karin; Bachmann, Jeannine; Holzhauer, Claudia; Kirchner, Roland; Raba, Katharina; Fischer, Johannes C; Martignoni, Marc E; Schiemann, Matthias; Alunni-Fabbroni, Marianna
2015-07-01
Pancreatic cancer is one of the most aggressive malignant tumors, mainly due to an aggressive metastasis spreading. In recent years, circulating tumor cells became associated to tumor metastasis. Little is known about their expression profiles. The aim of this study was to develop a complete workflow making it possible to isolate circulating tumor cells from patients with pancreatic cancer and their genetic characterization. We show that the proposed workflow offers a technical sensitivity and specificity high enough to detect and isolate single tumor cells. Moreover our approach makes feasible to genetically characterize single CTCs. Our work discloses a complete workflow to detect, count and genetically analyze individual CTCs isolated from blood samples. This method has a central impact on the early detection of metastasis development. The combination of cell quantification and genetic analysis provides the clinicians with a powerful tool not available so far. Copyright © 2015. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Delventhal, D.; Schultz, D.; Diaz Velez, J. C.
2017-10-01
IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and data driven analysis. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd has recently been rewritten to increase its scaling capabilities, handle user analysis workflows together with simulation production, and facilitate the integration with 3rd party scheduling tools. IceProd 2, the second generation of IceProd, has been running in production for several months now. We share our experience setting up the system and things we’ve learned along the way.
A big data approach for climate change indicators processing in the CLIP-C project
NASA Astrophysics Data System (ADS)
D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni
2016-04-01
Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.
2016-07-27
make risk-informed decisions during serious games . Statistical models of intra- game performance were developed to determine whether behaviors in...specific facets of the gameplay workflow were predictive of analytical performance and games outcomes. A study of over seventy instrumented teams revealed...more accurate game decisions. 2 Keywords: Humatics · Serious Games · Human-System Interaction · Instrumentation · Teamwork · Communication Analysis
Hoekman, Berend; Breitling, Rainer; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter
2012-01-01
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets. PMID:22318370
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure
2016-01-01
Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331
Towards an intelligent hospital environment: OR of the future.
Sutherland, Jeffrey V; van den Heuvel, Willem-Jan; Ganous, Tim; Burton, Matthew M; Kumar, Animesh
2005-01-01
Patients, providers, payers, and government demand more effective and efficient healthcare services, and the healthcare industry needs innovative ways to re-invent core processes. Business process reengineering (BPR) showed adopting new hospital information systems can leverage this transformation and workflow management technologies can automate process management. Our research indicates workflow technologies in healthcare require real time patient monitoring, detection of adverse events, and adaptive responses to breakdown in normal processes. Adaptive workflow systems are rarely implemented making current workflow implementations inappropriate for healthcare. The advent of evidence based medicine, guideline based practice, and better understanding of cognitive workflow combined with novel technologies including Radio Frequency Identification (RFID), mobile/wireless technologies, internet workflow, intelligent agents, and Service Oriented Architectures (SOA) opens up new and exciting ways of automating business processes. Total situational awareness of events, timing, and location of healthcare activities can generate self-organizing change in behaviors of humans and machines. A test bed of a novel approach towards continuous process management was designed for the new Weinburg Surgery Building at the University of Maryland Medical. Early results based on clinical process mapping and analysis of patient flow bottlenecks demonstrated 100% improvement in delivery of supplies and instruments at surgery start time. This work has been directly applied to the design of the DARPA Trauma Pod research program where robotic surgery will be performed on wounded soldiers on the battlefield.
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.
Mickelson, Robin S; Unertl, Kim M; Holden, Richard J
2016-10-12
Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.
Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W
2018-04-12
RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.
An XML Representation for Crew Procedures
NASA Technical Reports Server (NTRS)
Simpson, Richard C.
2005-01-01
NASA ensures safe operation of complex systems through the use of formally-documented procedures, which encode the operational knowledge of the system as derived from system experts. Crew members use procedure documentation on the ground for training purposes and on-board space shuttle and space station to guide their activities. Investigators at JSC are developing a new representation for procedures that is content-based (as opposed to display-based). Instead of specifying how a procedure should look on the printed page, the content-based representation will identify the components of a procedure and (more importantly) how the components are related (e.g., how the activities within a procedure are sequenced; what resources need to be available for each activity). This approach will allow different sets of rules to be created for displaying procedures on a computer screen, on a hand-held personal digital assistant (PDA), verbally, or on a printed page, and will also allow intelligent reasoning processes to automatically interpret and use procedure definitions. During his NASA fellowship, Dr. Simpson examined how various industries represent procedures (also called business processes or workflows), in areas such as manufacturing, accounting, shipping, or customer service. A useful method for designing and evaluating workflow representation languages is by determining their ability to encode various workflow patterns, which depict abstract relationships between the components of a procedure removed from the context of a specific procedure or industry. Investigators have used this type of analysis to evaluate how well-suited existing workflow representation languages are for various industries based on the workflow patterns that commonly arise across industry-specific procedures. Based on this type of analysis, it is already clear that existing workflow representations capture discrete flow of control (i.e., when one activity should start and stop based on when other activities start and stop), but do not capture the flow of data, materials, resources or priorities. Existing workflow representation languages are also limited to representing sequences of discrete activities, and cannot encode procedures involving continuous flow of information or materials between activities.
Common Data Models and Efficient Reproducible Workflows for Distributed Ocean Model Skill Assessment
NASA Astrophysics Data System (ADS)
Signell, R. P.; Snowden, D. P.; Howlett, E.; Fernandes, F. A.
2014-12-01
Model skill assessment requires discovery, access, analysis, and visualization of information from both sensors and models, and traditionally has been possible only by a few experts. The US Integrated Ocean Observing System (US-IOOS) consists of 17 Federal Agencies and 11 Regional Associations that produce data from various sensors and numerical models; exactly the information required for model skill assessment. US-IOOS is seeking to develop documented skill assessment workflows that are standardized, efficient, and reproducible so that a much wider community can participate in the use and assessment of model results. Standardization requires common data models for observational and model data. US-IOOS relies on the CF Conventions for observations and structured grid data, and on the UGRID Conventions for unstructured (e.g. triangular) grid data. This allows applications to obtain only the data they require in a uniform and parsimonious way using web services: OPeNDAP for model output and OGC Sensor Observation Service (SOS) for observed data. Reproducibility is enabled with IPython Notebooks shared on GitHub (http://github.com/ioos). These capture the entire skill assessment workflow, including user input, search, access, analysis, and visualization, ensuring that workflows are self-documenting and reproducible by anyone, using free software. Python packages for common data models are Pyugrid and the British Met Office Iris package. Python packages required to run the workflows (pyugrid, pyoos, and the British Met Office Iris package) are also available on GitHub and on Binstar.org so that users can run scenarios using the free Anaconda Python distribution. Hosted services such as Wakari enable anyone to reproduce these workflows for free, without installing any software locally, using just their web browser. We are also experimenting with Wakari Enterprise, which allows multi-user access from a web browser to an IPython Server running where large quantities of model output reside, increasing the efficiency. The open development and distribution of these workflows, and the software on which they depend, is an educational resource for those new to the field and a center of focus where practitioners can contribute new software and ideas.
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Planning bioinformatics workflows using an expert system.
Chen, Xiaoling; Chang, Jeffrey T
2017-04-15
Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Planning bioinformatics workflows using an expert system
Chen, Xiaoling; Chang, Jeffrey T.
2017-01-01
Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928
Study on user interface of pathology picture archiving and communication system.
Kim, Dasueran; Kang, Peter; Yun, Jungmin; Park, Sung-Hye; Seo, Jeong-Wook; Park, Peom
2014-01-01
It is necessary to improve the pathology workflow. A workflow task analysis was performed using a pathology picture archiving and communication system (pathology PACS) in order to propose a user interface for the Pathology PACS considering user experience. An interface analysis of the Pathology PACS in Seoul National University Hospital and a task analysis of the pathology workflow were performed by observing recorded video. Based on obtained results, a user interface for the Pathology PACS was proposed. Hierarchical task analysis of Pathology PACS was classified into 17 tasks including 1) pre-operation, 2) text, 3) images, 4) medical record viewer, 5) screen transition, 6) pathology identification number input, 7) admission date input, 8) diagnosis doctor, 9) diagnosis code, 10) diagnosis, 11) pathology identification number check box, 12) presence or absence of images, 13) search, 14) clear, 15) Excel save, 16) search results, and 17) re-search. And frequently used menu items were identified and schematized. A user interface for the Pathology PACS considering user experience could be proposed as a preliminary step, and this study may contribute to the development of medical information systems based on user experience and usability.
Yeung, Daniel; Boes, Peter; Ho, Meng Wei; Li, Zuofeng
2015-05-08
Image-guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X-rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post-treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state-of-the-art Web technologies, a domain model closely matching the workflow, a database-supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model-View-Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client-side technologies, such as jQuery, jQuery Plug-ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process.
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Zou, Wei; She, Jianwen; Tolstikov, Vladimir V.
2013-01-01
Current available biomarkers lack sensitivity and/or specificity for early detection of cancer. To address this challenge, a robust and complete workflow for metabolic profiling and data mining is described in details. Three independent and complementary analytical techniques for metabolic profiling are applied: hydrophilic interaction liquid chromatography (HILIC–LC), reversed-phase liquid chromatography (RP–LC), and gas chromatography (GC). All three techniques are coupled to a mass spectrometer (MS) in the full scan acquisition mode, and both unsupervised and supervised methods are used for data mining. The univariate and multivariate feature selection are used to determine subsets of potentially discriminative predictors. These predictors are further identified by obtaining accurate masses and isotopic ratios using selected ion monitoring (SIM) and data-dependent MS/MS and/or accurate mass MSn ion tree scans utilizing high resolution MS. A list combining all of the identified potential biomarkers generated from different platforms and algorithms is used for pathway analysis. Such a workflow combining comprehensive metabolic profiling and advanced data mining techniques may provide a powerful approach for metabolic pathway analysis and biomarker discovery in cancer research. Two case studies with previous published data are adapted and included in the context to elucidate the application of the workflow. PMID:24958150
Workflow in interventional radiology: nerve blocks and facet blocks
NASA Astrophysics Data System (ADS)
Siddoway, Donald; Ingeholm, Mary Lou; Burgert, Oliver; Neumuth, Thomas; Watson, Vance; Cleary, Kevin
2006-03-01
Workflow analysis has the potential to dramatically improve the efficiency and clinical outcomes of medical procedures. In this study, we recorded the workflow for nerve block and facet block procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in an internet browser, and allows the user to record the actions taken by the physician during a procedure. The data recorded during the procedure is stored as an XML document, which can then be further processed. We have successfully gathered data on a number if cases using a tablet PC, and these preliminary results show the feasibility of using this software in an interventional radiology setting. We are currently accruing additional cases and when more data has been collected we will analyze the workflow of these procedures to look for inefficiencies and potential improvements.
Sokol, Elena; Ulven, Trond; Færgeman, Nils J; Ejsing, Christer S
2015-06-01
Here we present a workflow for in-depth analysis of milk lipids that combines gas chromatography (GC) for fatty acid (FA) profiling and a shotgun lipidomics routine termed MS/MS ALL for structural characterization of molecular lipid species. To evaluate the performance of the workflow we performed a comparative lipid analysis of human milk, cow milk, and Lacprodan® PL-20, a phospholipid-enriched milk protein concentrate for infant formula. The GC analysis showed that human milk and Lacprodan have a similar FA profile with higher levels of unsaturated FAs as compared to cow milk. In-depth lipidomic analysis by MS/MS ALL revealed that each type of milk sample comprised distinct composition of molecular lipid species. Lipid class composition showed that the human and cow milk contain a higher proportion of triacylglycerols (TAGs) as compared to Lacprodan. Notably, the MS/MS ALL analysis demonstrated that the similar FA profile of human milk and Lacprodan determined by GC analysis is attributed to the composition of individual TAG species in human milk and glycerophospholipid species in Lacprodan. Moreover, the analysis of TAG molecules in Lacprodan and cow milk showed a high proportion of short-chain FAs that could not be monitored by GC analysis. The results presented here show that complementary GC and MS/MS ALL analysis is a powerful approach for characterization of molecular lipid species in milk and milk products. : Milk lipid analysis is routinely performed using gas chromatography. This method reports the total fatty acid composition of all milk lipids, but provides no structural or quantitative information about individual lipid molecules in milk or milk products. Here we present a workflow that integrates gas chromatography for fatty acid profiling and a shotgun lipidomics routine termed MS/MS ALL for structural analysis and quantification of molecular lipid species. We demonstrate the efficacy of this complementary workflow by a comparative analysis of molecular lipid species in human milk, cow milk, and a milk-based supplement used for infant formula.
Sokol, Elena; Ulven, Trond; Færgeman, Nils J; Ejsing, Christer S
2015-01-01
Here we present a workflow for in-depth analysis of milk lipids that combines gas chromatography (GC) for fatty acid (FA) profiling and a shotgun lipidomics routine termed MS/MSALL for structural characterization of molecular lipid species. To evaluate the performance of the workflow we performed a comparative lipid analysis of human milk, cow milk, and Lacprodan® PL-20, a phospholipid-enriched milk protein concentrate for infant formula. The GC analysis showed that human milk and Lacprodan have a similar FA profile with higher levels of unsaturated FAs as compared to cow milk. In-depth lipidomic analysis by MS/MSALL revealed that each type of milk sample comprised distinct composition of molecular lipid species. Lipid class composition showed that the human and cow milk contain a higher proportion of triacylglycerols (TAGs) as compared to Lacprodan. Notably, the MS/MSALL analysis demonstrated that the similar FA profile of human milk and Lacprodan determined by GC analysis is attributed to the composition of individual TAG species in human milk and glycerophospholipid species in Lacprodan. Moreover, the analysis of TAG molecules in Lacprodan and cow milk showed a high proportion of short-chain FAs that could not be monitored by GC analysis. The results presented here show that complementary GC and MS/MSALL analysis is a powerful approach for characterization of molecular lipid species in milk and milk products. Practical applications : Milk lipid analysis is routinely performed using gas chromatography. This method reports the total fatty acid composition of all milk lipids, but provides no structural or quantitative information about individual lipid molecules in milk or milk products. Here we present a workflow that integrates gas chromatography for fatty acid profiling and a shotgun lipidomics routine termed MS/MSALL for structural analysis and quantification of molecular lipid species. We demonstrate the efficacy of this complementary workflow by a comparative analysis of molecular lipid species in human milk, cow milk, and a milk-based supplement used for infant formula. PMID:26089741
Distributed Data Integration Infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Critchlow, T; Ludaescher, B; Vouk, M
The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, usemore » them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future.« less
2013-01-01
Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results. PMID:23368995
Algorithm sensitivity analysis and parameter tuning for tissue image segmentation pipelines
Kurç, Tahsin M.; Taveira, Luís F. R.; Melo, Alba C. M. A.; Gao, Yi; Kong, Jun; Saltz, Joel H.
2017-01-01
Abstract Motivation: Sensitivity analysis and parameter tuning are important processes in large-scale image analysis. They are very costly because the image analysis workflows are required to be executed several times to systematically correlate output variations with parameter changes or to tune parameters. An integrated solution with minimum user interaction that uses effective methodologies and high performance computing is required to scale these studies to large imaging datasets and expensive analysis workflows. Results: The experiments with two segmentation workflows show that the proposed approach can (i) quickly identify and prune parameters that are non-influential; (ii) search a small fraction (about 100 points) of the parameter search space with billions to trillions of points and improve the quality of segmentation results (Dice and Jaccard metrics) by as much as 1.42× compared to the results from the default parameters; (iii) attain good scalability on a high performance cluster with several effective optimizations. Conclusions: Our work demonstrates the feasibility of performing sensitivity analyses, parameter studies and auto-tuning with large datasets. The proposed framework can enable the quantification of error estimations and output variations in image segmentation pipelines. Availability and Implementation: Source code: https://github.com/SBU-BMI/region-templates/. Contact: teodoro@unb.br Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28062445
Algorithm sensitivity analysis and parameter tuning for tissue image segmentation pipelines.
Teodoro, George; Kurç, Tahsin M; Taveira, Luís F R; Melo, Alba C M A; Gao, Yi; Kong, Jun; Saltz, Joel H
2017-04-01
Sensitivity analysis and parameter tuning are important processes in large-scale image analysis. They are very costly because the image analysis workflows are required to be executed several times to systematically correlate output variations with parameter changes or to tune parameters. An integrated solution with minimum user interaction that uses effective methodologies and high performance computing is required to scale these studies to large imaging datasets and expensive analysis workflows. The experiments with two segmentation workflows show that the proposed approach can (i) quickly identify and prune parameters that are non-influential; (ii) search a small fraction (about 100 points) of the parameter search space with billions to trillions of points and improve the quality of segmentation results (Dice and Jaccard metrics) by as much as 1.42× compared to the results from the default parameters; (iii) attain good scalability on a high performance cluster with several effective optimizations. Our work demonstrates the feasibility of performing sensitivity analyses, parameter studies and auto-tuning with large datasets. The proposed framework can enable the quantification of error estimations and output variations in image segmentation pipelines. Source code: https://github.com/SBU-BMI/region-templates/ . teodoro@unb.br. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.
Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela
2017-04-07
In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pugmire, David; Kress, James; Choi, Jong
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
A Mixed-Methods Research Framework for Healthcare Process Improvement.
Bastian, Nathaniel D; Munoz, David; Ventura, Marta
2016-01-01
The healthcare system in the United States is spiraling out of control due to ever-increasing costs without significant improvements in quality, access to care, satisfaction, and efficiency. Efficient workflow is paramount to improving healthcare value while maintaining the utmost standards of patient care and provider satisfaction in high stress environments. This article provides healthcare managers and quality engineers with a practical healthcare process improvement framework to assess, measure and improve clinical workflow processes. The proposed mixed-methods research framework integrates qualitative and quantitative tools to foster the improvement of processes and workflow in a systematic way. The framework consists of three distinct phases: 1) stakeholder analysis, 2a) survey design, 2b) time-motion study, and 3) process improvement. The proposed framework is applied to the pediatric intensive care unit of the Penn State Hershey Children's Hospital. The implementation of this methodology led to identification and categorization of different workflow tasks and activities into both value-added and non-value added in an effort to provide more valuable and higher quality patient care. Based upon the lessons learned from the case study, the three-phase methodology provides a better, broader, leaner, and holistic assessment of clinical workflow. The proposed framework can be implemented in various healthcare settings to support continuous improvement efforts in which complexity is a daily element that impacts workflow. We proffer a general methodology for process improvement in a healthcare setting, providing decision makers and stakeholders with a useful framework to help their organizations improve efficiency. Published by Elsevier Inc.
JTSA: an open source framework for time series abstractions.
Sacchi, Lucia; Capozzi, Davide; Bellazzi, Riccardo; Larizza, Cristiana
2015-10-01
The evaluation of the clinical status of a patient is frequently based on the temporal evolution of some parameters, making the detection of temporal patterns a priority in data analysis. Temporal abstraction (TA) is a methodology widely used in medical reasoning for summarizing and abstracting longitudinal data. This paper describes JTSA (Java Time Series Abstractor), a framework including a library of algorithms for time series preprocessing and abstraction and an engine to execute a workflow for temporal data processing. The JTSA framework is grounded on a comprehensive ontology that models temporal data processing both from the data storage and the abstraction computation perspective. The JTSA framework is designed to allow users to build their own analysis workflows by combining different algorithms. Thanks to the modular structure of a workflow, simple to highly complex patterns can be detected. The JTSA framework has been developed in Java 1.7 and is distributed under GPL as a jar file. JTSA provides: a collection of algorithms to perform temporal abstraction and preprocessing of time series, a framework for defining and executing data analysis workflows based on these algorithms, and a GUI for workflow prototyping and testing. The whole JTSA project relies on a formal model of the data types and of the algorithms included in the library. This model is the basis for the design and implementation of the software application. Taking into account this formalized structure, the user can easily extend the JTSA framework by adding new algorithms. Results are shown in the context of the EU project MOSAIC to extract relevant patterns from data coming related to the long term monitoring of diabetic patients. The proof that JTSA is a versatile tool to be adapted to different needs is given by its possible uses, both as a standalone tool for data summarization and as a module to be embedded into other architectures to select specific phenotypes based on TAs in a large dataset. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Facilitating hydrological data analysis workflows in R: the RHydro package
NASA Astrophysics Data System (ADS)
Buytaert, Wouter; Moulds, Simon; Skoien, Jon; Pebesma, Edzer; Reusser, Dominik
2015-04-01
The advent of new technologies such as web-services and big data analytics holds great promise for hydrological data analysis and simulation. Driven by the need for better water management tools, it allows for the construction of much more complex workflows, that integrate more and potentially more heterogeneous data sources with longer tool chains of algorithms and models. With the scientific challenge of designing the most adequate processing workflow comes the technical challenge of implementing the workflow with a minimal risk for errors. A wide variety of new workbench technologies and other data handling systems are being developed. At the same time, the functionality of available data processing languages such as R and Python is increasing at an accelerating pace. Because of the large diversity of scientific questions and simulation needs in hydrology, it is unlikely that one single optimal method for constructing hydrological data analysis workflows will emerge. Nevertheless, languages such as R and Python are quickly gaining popularity because they combine a wide array of functionality with high flexibility and versatility. The object-oriented nature of high-level data processing languages makes them particularly suited for the handling of complex and potentially large datasets. In this paper, we explore how handling and processing of hydrological data in R can be facilitated further by designing and implementing a set of relevant classes and methods in the experimental R package RHydro. We build upon existing efforts such as the sp and raster packages for spatial data and the spacetime package for spatiotemporal data to define classes for hydrological data (HydroST). In order to handle simulation data from hydrological models conveniently, a HM class is defined. Relevant methods are implemented to allow for an optimal integration of the HM class with existing model fitting and simulation functionality in R. Lastly, we discuss some of the design challenges of the RHydro package, including integration with big data technologies, web technologies, and emerging data models in hydrology.
Autonomous Metabolomics for Rapid Metabolite Identification in Global Profiling
Benton, H. Paul; Ivanisevic, Julijana; Mahieu, Nathaniel G.; ...
2014-12-12
An autonomous metabolomic workflow combining mass spectrometry analysis with tandem mass spectrometry data acquisition was designed to allow for simultaneous data processing and metabolite characterization. Although previously tandem mass spectrometry data have been generated on the fly, the experiments described herein combine this technology with the bioinformatic resources of XCMS and METLIN. We can analyze large profiling datasets and simultaneously obtain structural identifications, as a result of this unique integration. Furthermore, validation of the workflow on bacterial samples allowed the profiling on the order of a thousand metabolite features with simultaneous tandem mass spectra data acquisition. The tandem mass spectrometrymore » data acquisition enabled automatic search and matching against the METLIN tandem mass spectrometry database, shortening the current workflow from days to hours. Overall, the autonomous approach to untargeted metabolomics provides an efficient means of metabolomic profiling, and will ultimately allow the more rapid integration of comparative analyses, metabolite identification, and data analysis at a systems biology level.« less
Biological Web Service Repositories Review.
Urdidiales-Nieto, David; Navas-Delgado, Ismael; Aldana-Montes, José F
2017-05-01
Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re-execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re-check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well-known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana
2015-06-30
Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.
DEWEY: the DICOM-enabled workflow engine system.
Erickson, Bradley J; Langer, Steve G; Blezek, Daniel J; Ryan, William J; French, Todd L
2014-06-01
Workflow is a widely used term to describe the sequence of steps to accomplish a task. The use of workflow technology in medicine and medical imaging in particular is limited. In this article, we describe the application of a workflow engine to improve workflow in a radiology department. We implemented a DICOM-enabled workflow engine system in our department. We designed it in a way to allow for scalability, reliability, and flexibility. We implemented several workflows, including one that replaced an existing manual workflow and measured the number of examinations prepared in time without and with the workflow system. The system significantly increased the number of examinations prepared in time for clinical review compared to human effort. It also met the design goals defined at its outset. Workflow engines appear to have value as ways to efficiently assure that complex workflows are completed in a timely fashion.
Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric
2018-01-01
Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
A formal approach to the analysis of clinical computer-interpretable guideline modeling languages.
Grando, M Adela; Glasspool, David; Fox, John
2012-01-01
To develop proof strategies to formally study the expressiveness of workflow-based languages, and to investigate their applicability to clinical computer-interpretable guideline (CIG) modeling languages. We propose two strategies for studying the expressiveness of workflow-based languages based on a standard set of workflow patterns expressed as Petri nets (PNs) and notions of congruence and bisimilarity from process calculus. Proof that a PN-based pattern P can be expressed in a language L can be carried out semi-automatically. Proof that a language L cannot provide the behavior specified by a PNP requires proof by exhaustion based on analysis of cases and cannot be performed automatically. The proof strategies are generic but we exemplify their use with a particular CIG modeling language, PROforma. To illustrate the method we evaluate the expressiveness of PROforma against three standard workflow patterns and compare our results with a previous similar but informal comparison. We show that the two proof strategies are effective in evaluating a CIG modeling language against standard workflow patterns. We find that using the proposed formal techniques we obtain different results to a comparable previously published but less formal study. We discuss the utility of these analyses as the basis for principled extensions to CIG modeling languages. Additionally we explain how the same proof strategies can be reused to prove the satisfaction of patterns expressed in the declarative language CIGDec. The proof strategies we propose are useful tools for analysing the expressiveness of CIG modeling languages. This study provides good evidence of the benefits of applying formal methods of proof over semi-formal ones. Copyright © 2011 Elsevier B.V. All rights reserved.
SoS Notebook: An Interactive Multi-Language Data Analysis Environment.
Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N
2018-05-22
Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.
Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization
Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley
2015-01-01
Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173
NASA Astrophysics Data System (ADS)
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo
2018-06-01
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo
2018-06-05
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Coiera, Enrico
2014-01-01
Background and objective Annotations to physical workspaces such as signs and notes are ubiquitous. When densely annotated, work areas become communication spaces. This study aims to characterize the types and purpose of such annotations. Methods A qualitative observational study was undertaken in two wards and the radiology department of a 440-bed metropolitan teaching hospital. Images were purposefully sampled; 39 were analyzed after excluding inferior images. Results Annotation functions included signaling identity, location, capability, status, availability, and operation. They encoded data, rules or procedural descriptions. Most aggregated into groups that either created a workflow by referencing each other, supported a common workflow without reference to each other, or were heterogeneous, referring to many workflows. Higher-level assemblies of such groupings were also observed. Discussion Annotations make visible the gap between work done and the capability of a space to support work. Annotations are repairs of an environment, improving fitness for purpose, fixing inadequacy in design, or meeting emergent needs. Annotations thus record the missing information needed to undertake tasks, typically added post-implemented. Measuring annotation levels post-implementation could help assess the fit of technology to task. Physical and digital spaces could meet broader user needs by formally supporting user customization, ‘programming through annotation’. Augmented reality systems could also directly support annotation, addressing existing information gaps, and enhancing work with context sensitive annotation. Conclusions Communication spaces offer a model of how work unfolds. Annotations make visible local adaptation that makes technology fit for purpose post-implementation and suggest an important role for annotatable information systems and digital augmentation of the physical environment. PMID:24005797
Visualizing Cross-sectional Data in a Real-World Context
NASA Astrophysics Data System (ADS)
Van Noten, K.; Lecocq, T.
2016-12-01
If you could fly around your research results in three dimensions, wouldn't you like to do it? Visualizing research results properly during scientific presentations already does half the job of informing the public on the geographic framework of your research. Many scientists use the Google Earth™ mapping service (V7.1.2.2041) because it's a great interactive mapping tool for assigning geographic coordinates to individual data points, localizing a research area, and draping maps of results over Earth's surface for 3D visualization. However, visualizations of research results in vertical cross-sections are often not shown simultaneously with the maps in Google Earth. A few tutorials and programs to display cross-sectional data in Google Earth do exist, and the workflow is rather simple. By importing a cross-sectional figure into in the open software SketchUp Make [Trimble Navigation Limited, 2016], any spatial model can be exported to a vertical figure in Google Earth. In this presentation a clear workflow/tutorial is presented how to image cross-sections manually in Google Earth. No software skills, nor any programming codes are required. It is very easy to use, offers great possibilities for teaching and allows fast figure manipulation in Google Earth. The full workflow can be found in "Van Noten, K. 2016. Visualizing Cross-Sectional Data in a Real-World Context. EOS, Transactions AGU, 97, 16-19".The video tutorial can be found here: https://www.youtube.com/watch?v=Tr8LwFJ4RYU&Figure: Cross-sectional Research Examples Illustrated in Google Earth
Boes, Peter; Ho, Meng Wei; Li, Zuofeng
2015-01-01
Image‐guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X‐rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post‐treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state‐of‐the‐art Web technologies, a domain model closely matching the workflow, a database‐supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model‐View‐Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client‐side technologies, such as jQuery, jQuery Plug‐ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process. PACS number: 87 PMID:26103504
a Workflow for UAV's Integration Into a Geodesign Platform
NASA Astrophysics Data System (ADS)
Anca, P.; Calugaru, A.; Alixandroae, I.; Nazarie, R.
2016-06-01
This paper presents a workflow for the development of various Geodesign scenarios. The subject is important in the context of identifying patterns and designing solutions for a Smart City with optimized public transportation, efficient buildings, efficient utilities, recreational facilities a.s.o.. The workflow describes the procedures starting with acquiring data in the field, data processing, orthophoto generation, DTM generation, integration into a GIS platform and analyzing for a better support for Geodesign. Esri's City Engine is used mostly for 3D modeling capabilities that enable the user to obtain 3D realistic models. The workflow uses as inputs information extracted from images acquired using UAVs technologies, namely eBee, existing 2D GIS geodatabases, and a set of CGA rules. The method that we used further, is called procedural modeling, and uses rules in order to extrude buildings, the street network, parcel zoning and side details, based on the initial attributes from the geodatabase. The resulted products are various scenarios for redesigning, for analyzing new exploitation sites. Finally, these scenarios can be published as interactive web scenes for internal, groups or pubic consultation. In this way, problems like the impact of new constructions being build, re-arranging green spaces or changing routes for public transportation, etc. are revealed through impact and visibility analysis or shadowing analysis and are brought to the citizen's attention. This leads to better decisions.
Backhausen, Lea L.; Herting, Megan M.; Buse, Judith; Roessner, Veit; Smolka, Michael N.; Vetter, Nora C.
2016-01-01
In structural magnetic resonance imaging motion artifacts are common, especially when not scanning healthy young adults. It has been shown that motion affects the analysis with automated image-processing techniques (e.g., FreeSurfer). This can bias results. Several developmental and adult studies have found reduced volume and thickness of gray matter due to motion artifacts. Thus, quality control is necessary in order to ensure an acceptable level of quality and to define exclusion criteria of images (i.e., determine participants with most severe artifacts). However, information about the quality control workflow and image exclusion procedure is largely lacking in the current literature and the existing rating systems differ. Here, we propose a stringent workflow of quality control steps during and after acquisition of T1-weighted images, which enables researchers dealing with populations that are typically affected by motion artifacts to enhance data quality and maximize sample sizes. As an underlying aim we established a thorough quality control rating system for T1-weighted images and applied it to the analysis of developmental clinical data using the automated processing pipeline FreeSurfer. This hands-on workflow and quality control rating system will aid researchers in minimizing motion artifacts in the final data set, and therefore enhance the quality of structural magnetic resonance imaging studies. PMID:27999528
Shukla, Nagesh; Keast, John E; Ceglarek, Darek
2014-10-01
The modelling of complex workflows is an important problem-solving technique within healthcare settings. However, currently most of the workflow models use a simplified flow chart of patient flow obtained using on-site observations, group-based debates and brainstorming sessions, together with historic patient data. This paper presents a systematic and semi-automatic methodology for knowledge acquisition with detailed process representation using sequential interviews of people in the key roles involved in the service delivery process. The proposed methodology allows the modelling of roles, interactions, actions, and decisions involved in the service delivery process. This approach is based on protocol generation and analysis techniques such as: (i) initial protocol generation based on qualitative interviews of radiology staff, (ii) extraction of key features of the service delivery process, (iii) discovering the relationships among the key features extracted, and, (iv) a graphical representation of the final structured model of the service delivery process. The methodology is demonstrated through a case study of a magnetic resonance (MR) scanning service-delivery process in the radiology department of a large hospital. A set of guidelines is also presented in this paper to visually analyze the resulting process model for identifying process vulnerabilities. A comparative analysis of different workflow models is also conducted. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Workflow management systems in radiology
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim
1998-07-01
In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.
Everware toolkit. Supporting reproducible science and challenge-driven education.
NASA Astrophysics Data System (ADS)
Ustyuzhanin, A.; Head, T.; Babuschkin, I.; Tiunov, A.
2017-10-01
Modern science clearly demands for a higher level of reproducibility and collaboration. To make research fully reproducible one has to take care of several aspects: research protocol description, data access, environment preservation, workflow pipeline, and analysis script preservation. Version control systems like git help with the workflow and analysis scripts part. Virtualization techniques like Docker or Vagrant can help deal with environments. Jupyter notebooks are a powerful platform for conducting research in a collaborative manner. We present project Everware that seamlessly integrates git repository management systems such as Github or Gitlab, Docker and Jupyter helping with a) sharing results of real research and b) boosts education activities. With the help of Everware one can not only share the final artifacts of research but all the depth of the research process. This been shown to be extremely helpful during organization of several data analysis hackathons and machine learning schools. Using Everware participants could start from an existing solution instead of starting from scratch. They could start contributing immediately. Everware allows its users to make use of their own computational resources to run the workflows they are interested in, which leads to higher scalability of the toolkit.
A framework for streamlining research workflow in neuroscience and psychology
Kubilius, Jonas
2014-01-01
Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691
SECIMTools: a suite of metabolomics data analysis tools.
Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M
2018-04-20
Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
Kwf-Grid workflow management system for Earth science applications
NASA Astrophysics Data System (ADS)
Tran, V.; Hluchy, L.
2009-04-01
In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.
Stockton, David B; Santamaria, Fidel
2017-10-01
We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
Jacobs, Peter L; Ridder, Lars; Ruijken, Marco; Rosing, Hilde; Jager, Nynke Gl; Beijnen, Jos H; Bas, Richard R; van Dongen, William D
2013-09-01
Comprehensive identification of human drug metabolites in first-in-man studies is crucial to avoid delays in later stages of drug development. We developed an efficient workflow for systematic identification of human metabolites in plasma or serum that combines metabolite prediction, high-resolution accurate mass LC-MS and MS vendor independent data processing. Retrospective evaluation of predictions for 14 (14)C-ADME studies published in the period 2007-January 2012 indicates that on average 90% of the major metabolites in human plasma can be identified by searching for accurate masses of predicted metabolites. Furthermore, the workflow can identify unexpected metabolites in the same processing run, by differential analysis of samples of drug-dosed subjects and (placebo-dosed, pre-dose or otherwise blank) control samples. To demonstrate the utility of the workflow we applied it to identify tamoxifen metabolites in serum of a breast cancer patient treated with tamoxifen. Previously published metabolites were confirmed in this study and additional metabolites were identified, two of which are discussed to illustrate the advantages of the workflow.
Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.
2015-01-01
Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Development of HEATHER for cochlear implant stimulation using a new modeling workflow.
Tran, Phillip; Sue, Andrian; Wong, Paul; Li, Qing; Carter, Paul
2015-02-01
The current conduction pathways resulting from monopolar stimulation of the cochlear implant were studied by developing a human electroanatomical total head reconstruction (namely, HEATHER). HEATHER was created from serially sectioned images of the female Visible Human Project dataset to encompass a total of 12 different tissues, and included computer-aided design geometries of the cochlear implant. Since existing methods were unable to generate the required complexity for HEATHER, a new modeling workflow was proposed. The results of the finite-element analysis agree with the literature, showing that the injected current exits the cochlea via the modiolus (14%), the basal end of the cochlea (22%), and through the cochlear walls (64%). It was also found that, once leaving the cochlea, the current travels to the implant body via the cranial cavity or scalp. The modeling workflow proved to be robust and flexible, allowing for meshes to be generated with substantial user control. Furthermore, the workflow could easily be employed to create realistic anatomical models of the human head for different bioelectric applications, such as deep brain stimulation, electroencephalography, and other biophysical phenomena.
Lerner, Thomas R.; Burden, Jemima J.; Nkwe, David O.; Pelchen-Matthews, Annegret; Domart, Marie-Charlotte; Durgan, Joanne; Weston, Anne; Jones, Martin L.; Peddie, Christopher J.; Carzaniga, Raffaella; Florey, Oliver; Marsh, Mark; Gutierrez, Maximiliano G.
2017-01-01
ABSTRACT The processes of life take place in multiple dimensions, but imaging these processes in even three dimensions is challenging. Here, we describe a workflow for 3D correlative light and electron microscopy (CLEM) of cell monolayers using fluorescence microscopy to identify and follow biological events, combined with serial blockface scanning electron microscopy to analyse the underlying ultrastructure. The workflow encompasses all steps from cell culture to sample processing, imaging strategy, and 3D image processing and analysis. We demonstrate successful application of the workflow to three studies, each aiming to better understand complex and dynamic biological processes, including bacterial and viral infections of cultured cells and formation of entotic cell-in-cell structures commonly observed in tumours. Our workflow revealed new insight into the replicative niche of Mycobacterium tuberculosis in primary human lymphatic endothelial cells, HIV-1 in human monocyte-derived macrophages, and the composition of the entotic vacuole. The broad application of this 3D CLEM technique will make it a useful addition to the correlative imaging toolbox for biomedical research. PMID:27445312
NASA Astrophysics Data System (ADS)
Niederau, Jan; Wellmann, Florian; Maersch, Jannik; Urai, Janos
2017-04-01
Programming is increasingly recognised an important skill for geoscientists - however, the hurdle to jump into programming for students with little or no experience can be high. We present here teaching concepts on the basis of Jupyter notebooks that combine, in an intuitive way, formatted instruction text with code cells in a single environment. This integration allows for an exposure to programming on several levels: from a complete interactive presentation of content, where students require no or very limited programming experience, to highly complex geoscientific computations. We consider these notebooks therefore as an ideal medium to present computational content to students in the field of geosciences. We show here how we use these notebooks to develop digital documents in Python for undergrad-students, who can then learn about basic concepts in structural geology via self-assessment. Such notebooks comprise concepts such as: stress tensor, strain ellipse, or the mohr circle. Students can interactively change parameters, e.g. by using sliders and immediately see the results. They can further experiment and extend the notebook by writing their own code within the notebook. Jupyter Notebooks for teaching purposes can be provided ready-to-use via online services. That is, students do not need to install additional software on their devices in order to work with the notebooks. We also use Jupyter Notebooks for automatic grading of programming assignments in multiple lectures. An implemented workflow facilitates the generation, distribution of assignments, as well as the final grading. Compared to previous grading methods with a high percentage of repetitive manual grading, the implemented workflow proves to be much more time efficient.
A reliable computational workflow for the selection of optimal screening libraries.
Gilad, Yocheved; Nadassy, Katalin; Senderowitz, Hanoch
2015-01-01
The experimental screening of compound collections is a common starting point in many drug discovery projects. Successes of such screening campaigns critically depend on the quality of the screened library. Many libraries are currently available from different vendors yet the selection of the optimal screening library for a specific project is challenging. We have devised a novel workflow for the rational selection of project-specific screening libraries. The workflow accepts as input a set of virtual candidate libraries and applies the following steps to each library: (1) data curation; (2) assessment of ADME/T profile; (3) assessment of the number of promiscuous binders/frequent HTS hitters; (4) assessment of internal diversity; (5) assessment of similarity to known active compound(s) (optional); (6) assessment of similarity to in-house or otherwise accessible compound collections (optional). For ADME/T profiling, Lipinski's and Veber's rule-based filters were implemented and a new blood brain barrier permeation model was developed and validated (85 and 74 % success rate for training set and test set, respectively). Diversity and similarity descriptors which demonstrated best performances in terms of their ability to select either diverse or focused sets of compounds from three databases (Drug Bank, CMC and CHEMBL) were identified and used for diversity and similarity assessments. The workflow was used to analyze nine common screening libraries available from six vendors. The results of this analysis are reported for each library providing an assessment of its quality. Furthermore, a consensus approach was developed to combine the results of these analyses into a single score for selecting the optimal library under different scenarios. We have devised and tested a new workflow for the rational selection of screening libraries under different scenarios. The current workflow was implemented using the Pipeline Pilot software yet due to the usage of generic components, it can be easily adapted and reproduced by computational groups interested in rational selection of screening libraries. Furthermore, the workflow could be readily modified to include additional components. This workflow has been routinely used in our laboratory for the selection of libraries in multiple projects and consistently selects libraries which are well balanced across multiple parameters.Graphical abstract.
AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsai, Yingssu; Stanford University, 333 Campus Drive, Mudd Building, Stanford, CA 94305-5080; McPhillips, Scott E.
New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data,more » performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.« less
Leveraging workflow control patterns in the domain of clinical practice guidelines.
Kaiser, Katharina; Marcos, Mar
2016-02-10
Clinical practice guidelines (CPGs) include recommendations describing appropriate care for the management of patients with a specific clinical condition. A number of representation languages have been developed to support executable CPGs, with associated authoring/editing tools. Even with tool assistance, authoring of CPG models is a labor-intensive task. We aim at facilitating the early stages of CPG modeling task. In this context, we propose to support the authoring of CPG models based on a set of suitable procedural patterns described in an implementation-independent notation that can be then semi-automatically transformed into one of the alternative executable CPG languages. We have started with the workflow control patterns which have been identified in the fields of workflow systems and business process management. We have analyzed the suitability of these patterns by means of a qualitative analysis of CPG texts. Following our analysis we have implemented a selection of workflow patterns in the Asbru and PROforma CPG languages. As implementation-independent notation for the description of patterns we have chosen BPMN 2.0. Finally, we have developed XSLT transformations to convert the BPMN 2.0 version of the patterns into the Asbru and PROforma languages. We showed that although a significant number of workflow control patterns are suitable to describe CPG procedural knowledge, not all of them are applicable in the context of CPGs due to their focus on single-patient care. Moreover, CPGs may require additional patterns not included in the set of workflow control patterns. We also showed that nearly all the CPG-suitable patterns can be conveniently implemented in the Asbru and PROforma languages. Finally, we demonstrated that individual patterns can be semi-automatically transformed from a process specification in BPMN 2.0 to executable implementations in these languages. We propose a pattern and transformation-based approach for the development of CPG models. Such an approach can form the basis of a valid framework for the authoring of CPG models. The identification of adequate patterns and the implementation of transformations to convert patterns from a process specification into different executable implementations are the first necessary steps for our approach.
Anima: Modular Workflow System for Comprehensive Image Data Analysis
Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa
2014-01-01
Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541
Work Flow Analysis Report Action Tracking
DOE Office of Scientific and Technical Information (OSTI.GOV)
PETERMANN, M.L.
The Work Flow Analysis Report will be used to facilitate the requirements for implementing the further deployment of the Action Tracking module of Passport. The report consists of workflow integration processes for Action Tracking.
Mehdipoor, Hamed; Zurita-Milla, Raul; Rosemartin, Alyssa; Gerst, Katharine L.; Weltzin, Jake F.
2015-01-01
Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.
SU-D-209-04: Raise Your Table: An Effective Way to Reduce Radiation Dose for Fluoroscopy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huo, D; Hoerner, M; Toskich, B
2016-06-15
Purpose: Patient table height plays an important role in estimating patient skin dose for interventional radiology (IR) procedures, because the patient’s skin location is dependent on the height of table. Variation in table height can lead to as much as 150% difference in skin dose for patient exams with similar air kerma meter readings. In our facility, IR procedural workflow was recently changed to require the IR physicians to confirm the patient table height before the procedure. The patient table height data was collected before and after this workflow change to validate the implementation of this practice. Methods: Table heightmore » information was analyzed for all procedures performed in three IR rooms, which were impacted by the workflow change, covering three months before and after the change (Aug 2015 to Jan 2016). In total, 442, 425, and 390 procedures were performed in these three rooms over this time period. There were no personnel or procedure assignment changes during the six-month period of time. Statistical analysis was performed for the average table height changes before and after the workflow change. Results: For the three IR rooms investigated, after the workflow change, the average table heights were increased by 1.43 cm (p=0.004084), 0.66 cm (p=0.187089), and 1.59 cm (p=0.002193), providing a corresponding estimated skin dose savings of 6.76%, 2.94% and 7.62%, respectively. After the workflow change, the average table height was increased by 0.95 cm, 0.63 cm, 0.55 cm, 1.07 cm, 1.12 cm, and 3.36 cm for the six physicians who routinely work in these three rooms. Conclusion: Consistent improvement in table height settings has been observed for all IR rooms and all physicians following a simple workflow change. This change has led to significant patient dose savings by making physicians aware of the pre-procedure table position.« less
Modernizing Earth and Space Science Modeling Workflows in the Big Data Era
NASA Astrophysics Data System (ADS)
Kinter, J. L.; Feigelson, E.; Walker, R. J.; Tino, C.
2017-12-01
Modeling is a major aspect of the Earth and space science research. The development of numerical models of the Earth system, planetary systems or astrophysical systems is essential to linking theory with observations. Optimal use of observations that are quite expensive to obtain and maintain typically requires data assimilation that involves numerical models. In the Earth sciences, models of the physical climate system are typically used for data assimilation, climate projection, and inter-disciplinary research, spanning applications from analysis of multi-sensor data sets to decision-making in climate-sensitive sectors with applications to ecosystems, hazards, and various biogeochemical processes. In space physics, most models are from first principles, require considerable expertise to run and are frequently modified significantly for each case study. The volume and variety of model output data from modeling Earth and space systems are rapidly increasing and have reached a scale where human interaction with data is prohibitively inefficient. A major barrier to progress is that modeling workflows isn't deemed by practitioners to be a design problem. Existing workflows have been created by a slow accretion of software, typically based on undocumented, inflexible scripts haphazardly modified by a succession of scientists and students not trained in modern software engineering methods. As a result, existing modeling workflows suffer from an inability to onboard new datasets into models; an inability to keep pace with accelerating data production rates; and irreproducibility, among other problems. These factors are creating an untenable situation for those conducting and supporting Earth system and space science. Improving modeling workflows requires investments in hardware, software and human resources. This paper describes the critical path issues that must be targeted to accelerate modeling workflows, including script modularization, parallelization, and automation in the near term, and longer term investments in virtualized environments for improved scalability, tolerance for lossy data compression, novel data-centric memory and storage technologies, and tools for peer reviewing, preserving and sharing workflows, as well as fundamental statistical and machine learning algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey, Dustin Yewell
This document is a white paper marketing proposal for Echo™ is a data analysis platform designed for efficient, robust, and scalable creation and execution of complex workflows. Echo’s analysis management system refers to the ability to track, understand, and reproduce workflows used for arriving at results and decisions. Echo improves on traditional scripted data analysis in MATLAB, Python, R, and other languages to allow analysts to make better use of their time. Additionally, the Echo platform provides a powerful data management and curation solution allowing analysts to quickly find, access, and consume datasets. After two years of development and amore » first release in early 2016, Echo is now available for use with many data types in a wide range of application domains. Echo provides tools that allow users to focus on data analysis and decisions with confidence that results are reported accurately.« less
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.
Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G
2013-07-01
MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.
DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows
Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.
2013-01-01
MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...
2015-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha
2014-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
An image analysis system for near-infrared (NIR) fluorescence lymph imaging
NASA Astrophysics Data System (ADS)
Zhang, Jingdan; Zhou, Shaohua Kevin; Xiang, Xiaoyan; Rasmussen, John C.; Sevick-Muraca, Eva M.
2011-03-01
Quantitative analysis of lymphatic function is crucial for understanding the lymphatic system and diagnosing the associated diseases. Recently, a near-infrared (NIR) fluorescence imaging system is developed for real-time imaging lymphatic propulsion by intradermal injection of microdose of a NIR fluorophore distal to the lymphatics of interest. However, the previous analysis software3, 4 is underdeveloped, requiring extensive time and effort to analyze a NIR image sequence. In this paper, we develop a number of image processing techniques to automate the data analysis workflow, including an object tracking algorithm to stabilize the subject and remove the motion artifacts, an image representation named flow map to characterize lymphatic flow more reliably, and an automatic algorithm to compute lymph velocity and frequency of propulsion. By integrating all these techniques to a system, the analysis workflow significantly reduces the amount of required user interaction and improves the reliability of the measurement.
Metagenomics workflow analysis of endophytic bacteria from oil palm fruits
NASA Astrophysics Data System (ADS)
Tanjung, Z. A.; Aditama, R.; Sudania, W. M.; Utomo, C.; Liwang, T.
2017-05-01
Next-Generation Sequencing (NGS) has become a powerful sequencing tool for microbial study especially to lead the establishment of the field area of metagenomics. This study described a workflow to analyze metagenomics data of a Sequence Read Archive (SRA) file under accession ERP004286 deposited by University of Sao Paulo. It was a direct sequencing data generated by 454 pyrosequencing platform originated from oil palm fruits endophytic bacteria which were cultured using oil-palm enriched medium. This workflow used SortMeRNA to split ribosomal reads sequence, Newbler (GS Assembler and GS Mapper) to assemble and map reads into genome reference, BLAST package to identify and annotate contigs sequence, and QualiMap for statistical analysis. Eight bacterial species were identified in this study. Enterobacter cloacae was the most abundant species followed by Citrobacter koseri, Seratia marcescens, Latococcus lactis subsp. lactis, Klebsiella pneumoniae, Citrobacter amalonaticus, Achromobacter xylosoxidans, and Pseudomonas sp. respectively. All of these species have been reported as endophyte bacteria in various plant species and each has potential as plant growth promoting bacteria or another application in agricultural industries.
Solomon, April D; Hytinen, Madison E; McClain, Aryn M; Miller, Marilyn T; Dawson Cruz, Tracey
2018-01-01
DNA profiles have been obtained from fingerprints, but there is limited knowledge regarding DNA analysis from archived latent fingerprints-touch DNA "sandwiched" between adhesive and paper. Thus, this study sought to comparatively analyze a variety of collection and analytical methods in an effort to seek an optimized workflow for this specific sample type. Untreated and treated archived latent fingerprints were utilized to compare different biological sampling techniques, swab diluents, DNA extraction systems, DNA concentration practices, and post-amplification purification methods. Archived latent fingerprints disassembled and sampled via direct cutting, followed by DNA extracted using the QIAamp® DNA Investigator Kit, and concentration with Centri-Sep™ columns increased the odds of obtaining an STR profile. Using the recommended DNA workflow, 9 of the 10 samples provided STR profiles, which included 7-100% of the expected STR alleles and two full profiles. Thus, with carefully selected procedures, archived latent fingerprints can be a viable DNA source for criminal investigations including cold/postconviction cases. © 2017 American Academy of Forensic Sciences.
Zhou, Mowei; Paša-Tolić, Ljiljana; Stenoien, David L
2017-02-03
As histones play central roles in most chromosomal functions including regulation of DNA replication, DNA damage repair, and gene transcription, both their basic biology and their roles in disease development have been the subject of intense study. Because multiple post-translational modifications (PTMs) along the entire protein sequence are potential regulators of histones, a top-down approach, where intact proteins are analyzed, is ultimately required for complete characterization of proteoforms. However, significant challenges remain for top-down histone analysis primarily because of deficiencies in separation/resolving power and effective identification algorithms. Here we used state-of-the-art mass spectrometry and a bioinformatics workflow for targeted data analysis and visualization. The workflow uses ProMex for intact mass deconvolution, MSPathFinder as a search engine, and LcMsSpectator as a data visualization tool. When complemented with the open-modification tool TopPIC, this workflow enabled identification of novel histone PTMs including tyrosine bromination on histone H4 and H2A, H3 glutathionylation, and mapping of conventional PTMs along the entire protein for many histone subunits.
Installation and Testing of ITER Integrated Modeling and Analysis Suite (IMAS) on DIII-D
NASA Astrophysics Data System (ADS)
Lao, L.; Kostuk, M.; Meneghini, O.; Smith, S.; Staebler, G.; Kalling, R.; Pinches, S.
2017-10-01
A critical objective of the ITER Integrated Modeling Program is the development of IMAS to support ITER plasma operation and research activities. An IMAS framework has been established based on the earlier work carried out within the EU. It consists of a physics data model and a workflow engine. The data model is capable of representing both simulation and experimental data and is applicable to ITER and other devices. IMAS has been successfully installed on a local DIII-D server using a flexible installer capable of managing the core data access tools (Access Layer and Data Dictionary) and optionally the Kepler workflow engine and coupling tools. A general adaptor for OMFIT (a workflow engine) is being built for adaptation of any analysis code to IMAS using a new IMAS universal access layer (UAL) interface developed from an existing OMFIT EU Integrated Tokamak Modeling UAL. Ongoing work includes development of a general adaptor for EFIT and TGLF based on this new UAL that can be readily extended for other physics codes within OMFIT. Work supported by US DOE under DE-FC02-04ER54698.
An Update on Improvements to NiCE Support for PROTEUS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, Andrew; McCaskey, Alexander J.; Billings, Jay Jay
2015-09-01
The Department of Energy Office of Nuclear Energy's Nuclear Energy Advanced Modeling and Simulation (NEAMS) program has supported the development of the NEAMS Integrated Computational Environment (NiCE), a modeling and simulation workflow environment that provides services and plugins to facilitate tasks such as code execution, model input construction, visualization, and data analysis. This report details the development of workflows for the reactor core neutronics application, PROTEUS. This advanced neutronics application (primarily developed at Argonne National Laboratory) aims to improve nuclear reactor design and analysis by providing an extensible and massively parallel, finite-element solver for current and advanced reactor fuel neutronicsmore » modeling. The integration of PROTEUS-specific tools into NiCE is intended to make the advanced capabilities that PROTEUS provides more accessible to the nuclear energy research and development community. This report will detail the work done to improve existing PROTEUS workflow support in NiCE. We will demonstrate and discuss these improvements, including the development of flexible IO services, an improved interface for input generation, and the addition of advanced Fortran development tools natively in the platform.« less
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleese van Dam, Kerstin; Lansing, Carina S.; Elsethagen, Todd O.
2014-01-28
Modern workflow systems enable scientists to run ensemble simulations at unprecedented scales and levels of complexity, allowing them to study system sizes previously impossible to achieve, due to the inherent resource requirements needed for the modeling work. However as a result of these new capabilities the science teams suddenly also face unprecedented data volumes that they are unable to analyze with their existing tools and methodologies in a timely fashion. In this paper we will describe the ongoing development work to create an integrated data intensive scientific workflow and analysis environment that offers researchers the ability to easily create andmore » execute complex simulation studies and provides them with different scalable methods to analyze the resulting data volumes. The integration of simulation and analysis environments is hereby not only a question of ease of use, but supports fundamental functions in the correlated analysis of simulation input, execution details and derived results for multi-variant, complex studies. To this end the team extended and integrated the existing capabilities of the Velo data management and analysis infrastructure, the MeDICi data intensive workflow system and RHIPE the R for Hadoop version of the well-known statistics package, as well as developing a new visual analytics interface for the result exploitation by multi-domain users. The capabilities of the new environment are demonstrated on a use case that focusses on the Pacific Northwest National Laboratory (PNNL) building energy team, showing how they were able to take their previously local scale simulations to a nationwide level by utilizing data intensive computing techniques not only for their modeling work, but also for the subsequent analysis of their modeling results. As part of the PNNL research initiative PRIMA (Platform for Regional Integrated Modeling and Analysis) the team performed an initial 3 year study of building energy demands for the US Eastern Interconnect domain, which they are now planning to extend to predict the demand for the complete century. The initial study raised their data demands from a few GBs to 400GB for the 3year study and expected tens of TBs for the full century.« less
Rapid Assessment of Contaminants and Interferences in Mass Spectrometry Data Using Skyline
NASA Astrophysics Data System (ADS)
Rardin, Matthew J.
2018-04-01
Proper sample preparation in proteomic workflows is essential to the success of modern mass spectrometry experiments. Complex workflows often require reagents which are incompatible with MS analysis (e.g., detergents) necessitating a variety of sample cleanup procedures. Efforts to understand and mitigate sample contamination are a continual source of disruption with respect to both time and resources. To improve the ability to rapidly assess sample contamination from a diverse array of sources, I developed a molecular library in Skyline for rapid extraction of contaminant precursor signals using MS1 filtering. This contaminant template library is easily managed and can be modified for a diverse array of mass spectrometry sample preparation workflows. Utilization of this template allows rapid assessment of sample integrity and indicates potential sources of contamination. [Figure not available: see fulltext.
RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service
NASA Astrophysics Data System (ADS)
Yang, Chao; Chen, Nengcheng; Di, Liping
2012-10-01
Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.
Cytoscape: the network visualization tool for GenomeSpace workflows.
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537
Focus: a robust workflow for one-dimensional NMR spectral analysis.
Alonso, Arnald; Rodríguez, Miguel A; Vinaixa, Maria; Tortosa, Raül; Correig, Xavier; Julià, Antonio; Marsal, Sara
2014-01-21
One-dimensional (1)H NMR represents one of the most commonly used analytical techniques in metabolomic studies. The increase in the number of samples analyzed as well as the technical improvements involving instrumentation and spectral acquisition demand increasingly accurate and efficient high-throughput data processing workflows. We present FOCUS, an integrated and innovative methodology that provides a complete data analysis workflow for one-dimensional NMR-based metabolomics. This tool will allow users to easily obtain a NMR peak feature matrix ready for chemometric analysis as well as metabolite identification scores for each peak that greatly simplify the biological interpretation of the results. The algorithm development has been focused on solving the critical difficulties that appear at each data processing step and that can dramatically affect the quality of the results. As well as method integration, simplicity has been one of the main objectives in FOCUS development, requiring very little user input to perform accurate peak alignment, peak picking, and metabolite identification. The new spectral alignment algorithm, RUNAS, allows peak alignment with no need of a reference spectrum, and therefore, it reduces the bias introduced by other alignment approaches. Spectral alignment has been tested against previous methodologies obtaining substantial improvements in the case of moderate or highly unaligned spectra. Metabolite identification has also been significantly improved, using the positional and correlation peak patterns in contrast to a reference metabolite panel. Furthermore, the complete workflow has been tested using NMR data sets from 60 human urine samples and 120 aqueous liver extracts, reaching a successful identification of 42 metabolites from the two data sets. The open-source software implementation of this methodology is available at http://www.urr.cat/FOCUS.
Using telephony data to facilitate discovery of clinical workflows.
Rucker, Donald W
2017-04-19
Discovery of clinical workflows to target for redesign using methods such as Lean and Six Sigma is difficult. VoIP telephone call pattern analysis may complement direct observation and EMR-based tools in understanding clinical workflows at the enterprise level by allowing visualization of institutional telecommunications activity. To build an analytic framework mapping repetitive and high-volume telephone call patterns in a large medical center to their associated clinical units using an enterprise unified communications server log file and to support visualization of specific call patterns using graphical networks. Consecutive call detail records from the medical center's unified communications server were parsed to cross-correlate telephone call patterns and map associated phone numbers to a cost center dictionary. Hashed data structures were built to allow construction of edge and node files representing high volume call patterns for display with an open source graph network tool. Summary statistics for an analysis of exactly one week's call detail records at a large academic medical center showed that 912,386 calls were placed with a total duration of 23,186 hours. Approximately half of all calling called number pairs had an average call duration under 60 seconds and of these the average call duration was 27 seconds. Cross-correlation of phone calls identified by clinical cost center can be used to generate graphical displays of clinical enterprise communications. Many calls are short. The compact data transfers within short calls may serve as automation or re-design targets. The large absolute amount of time medical center employees were engaged in VoIP telecommunications suggests that analysis of telephone call patterns may offer additional insights into core clinical workflows.
Visualizing the Big (and Large) Data from an HPC Resource
NASA Astrophysics Data System (ADS)
Sisneros, R.
2015-10-01
Supercomputers are built to endure painfully large simulations and contend with resulting outputs. These are characteristics that scientists are all too willing to test the limits of in their quest for science at scale. The data generated during a scientist's workflow through an HPC center (large data) is the primary target for analysis and visualization. However, the hardware itself is also capable of generating volumes of diagnostic data (big data); this presents compelling opportunities to deploy analogous analytic techniques. In this paper we will provide a survey of some of the many ways in which visualization and analysis may be crammed into the scientific workflow as well as utilized on machine-specific data.
Kroon, Frederieke; Motti, Cherie; Talbot, Sam; Sobral, Paula; Puotinen, Marji
2018-07-01
Plastic pollution is ubiquitous throughout the marine environment, with microplastic (i.e. <5 mm) contamination a global issue of emerging concern. The lack of universally accepted methods for quantifying microplastic contamination, including consistent application of microscopy, photography, an spectroscopy and photography, may result in unrealistic contamination estimates. Here, we present and apply an analysis workflow tailored to quantifying microplastic contamination in marine waters, incorporating stereomicroscopic visual sorting, microscopic photography and attenuated total reflectance (ATR) Fourier transform infrared (FTIR) spectroscopy. The workflow outlines step-by-step processing and associated decision making, thereby reducing bias in plastic identification and improving confidence in contamination estimates. Specific processing steps include (i) the use of a commercial algorithm-based comparison of particle spectra against an extensive commercially curated spectral library, followed by spectral interpretation to establish the chemical composition, (ii) a comparison against a customised contaminant spectral library to eliminate procedural contaminants, and (iii) final assignment of particles as either natural- or anthropogenic-derived materials, based on chemical type, a compare analysis of each particle against other particle spectra, and physical characteristics of particles. Applying this workflow to 54 tow samples collected in marine waters of North-Western Australia visually identified 248 potential anthropogenic particles. Subsequent ATR-FTIR spectroscopy, chemical assignment and visual re-inspection of photographs established 144 (58%) particles to be of anthropogenic origin. Of the original 248 particles, 97 (39%) were ultimately confirmed to be plastics, with 85 of these (34%) classified as microplastics, demonstrating that over 60% of particles may be misidentified as plastics if visual identification is not complemented by spectroscopy. Combined, this tailored analysis workflow outlines a consistent and sequential process to quantify contamination by microplastics and other anthropogenic microparticles in marine waters. Importantly, its application will contribute to more realistic estimates of microplastic contamination in marine waters, informing both ecological risk assessments and experimental concentrations in effect studies. Copyright © 2018 Australian Institute of Marine Science. Published by Elsevier Ltd.. All rights reserved.
Palmblad, Magnus; Torvik, Vetle I
2017-01-01
Tropical medicine appeared as a distinct sub-discipline in the late nineteenth century, during a period of rapid European colonial expansion in Africa and Asia. After a dramatic drop after World War II, research on tropical diseases have received more attention and research funding in the twenty-first century. We used Apache Taverna to integrate Europe PMC and MapAffil web services, containing the spatiotemporal analysis workflow from a list of PubMed queries to a list of publication years and author affiliations geoparsed to latitudes and longitudes. The results could then be visualized in the Quantum Geographic Information System (QGIS). Our workflows automatically matched 253,277 affiliations to geographical coordinates for the first authors of 379,728 papers on tropical diseases in a single execution. The bibliometric analyses show how research output in tropical diseases follow major historical shifts in the twentieth century and renewed interest in and funding for tropical disease research in the twenty-first century. They show the effects of disease outbreaks, WHO eradication programs, vaccine developments, wars, refugee migrations, and peace treaties. Literature search and geoparsing web services can be combined in scientific workflows performing a complete spatiotemporal bibliometric analyses of research in tropical medicine. The workflows and datasets are freely available and can be used to reproduce or refine the analyses and test specific hypotheses or look into particular diseases or geographic regions. This work exceeds all previously published bibliometric analyses on tropical diseases in both scale and spatiotemporal range.
Post, Harm; Penning, Renske; Fitzpatrick, Martin A; Garrigues, Luc B; Wu, W; MacGillavry, Harold D; Hoogenraad, Casper C; Heck, Albert J R; Altelaar, A F Maarten
2017-02-03
Because of the low stoichiometry of protein phosphorylation, targeted enrichment prior to LC-MS/MS analysis is still essential. The trend in phosphoproteome analysis is shifting toward an increasing number of biological replicates per experiment, ideally starting from very low sample amounts, placing new demands on enrichment protocols to make them less labor-intensive, more sensitive, and less prone to variability. Here we assessed an automated enrichment protocol using Fe(III)-IMAC cartridges on an AssayMAP Bravo platform to meet these demands. The automated Fe(III)-IMAC-based enrichment workflow proved to be more effective when compared to a TiO 2 -based enrichment using the same platform and a manual Ti(IV)-IMAC-based enrichment workflow. As initial samples, a dilution series of both human HeLa cell and primary rat hippocampal neuron lysates was used, going down to 0.1 μg of peptide starting material. The optimized workflow proved to be efficient, sensitive, and reproducible, identifying, localizing, and quantifying thousands of phosphosites from just micrograms of starting material. To further test the automated workflow in genuine biological applications, we monitored EGF-induced signaling in hippocampal neurons, starting with only 200 000 primary cells, resulting in ∼50 μg of protein material. This revealed a comprehensive phosphoproteome, showing regulation of multiple members of the MAPK pathway and reduced phosphorylation status of two glutamate receptors involved in synaptic plasticity.
Preclinical Feasibility of a Technology Framework for MRI-guided Iliac Angioplasty
Rube, Martin A.; Fernandez-Gutierrez, Fabiola; Cox, Benjamin F.; Holbrook, Andrew B.; Houston, J. Graeme; White, Richard D.; McLeod, Helen; Fatahi, Mahsa; Melzer, Andreas
2015-01-01
Purpose Interventional MRI has significant potential for image guidance of iliac angioplasty and related vascular procedures. A technology framework with in-room image display, control, communication and MRI-guided intervention techniques was designed and tested for its potential to provide safe, fast and efficient MRI-guided angioplasty of the iliac arteries. Methods A 1.5T MRI scanner was adapted for interactive imaging during endovascular procedures using new or modified interventional devices such as guidewires and catheters. A perfused vascular phantom was used for testing. Pre-, intra- and post-procedural visualization and measurement of vascular morphology and flow was implemented. A detailed analysis of X-Ray fluoroscopic angiography workflow was conducted and applied. Two interventional radiologists and one physician in training performed 39 procedures. All procedures were timed and analyzed. Results MRI-guided iliac angioplasty procedures were successfully performed with progressive adaptation of techniques and workflow. The workflow, setup and protocol enabled a reduction in table time for a dedicated MRI-guided procedure to 6 min 33 s with a mean procedure time of 9 min 2 s, comparable to the mean procedure time of 8 min 42 s for the standard X-Ray guided procedure. Conclusions MRI-guided iliac vascular interventions were found to be feasible and practical using this framework and optimized workflow. In particular the real-time flow analysis was found to be helpful for pre- and post-interventional assessments. Design optimization of the catheters and in vivo experiments are required before clinical evaluation. PMID:25102933
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Building asynchronous geospatial processing workflows with web services
NASA Astrophysics Data System (ADS)
Zhao, Peisheng; Di, Liping; Yu, Genong
2012-02-01
Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.
GUEST EDITOR'S INTRODUCTION: Guest Editor's introduction
NASA Astrophysics Data System (ADS)
Chrysanthis, Panos K.
1996-12-01
Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA This special issue focuses on current efforts to represent and support workflows that integrate information systems and human resources within a business or manufacturing enterprise. Workflows may also be viewed as an emerging computational paradigm for effective structuring of cooperative applications involving human users and access to diverse data types not necessarily maintained by traditional database management systems. A workflow is an automated organizational process (also called business process) which consists of a set of activities or tasks that need to be executed in a particular controlled order over a combination of heterogeneous database systems and legacy systems. Within workflows, tasks are performed cooperatively by either human or computational agents in accordance with their roles in the organizational hierarchy. The challenge in facilitating the implementation of workflows lies in developing efficient workflow management systems. A workflow management system (also called workflow server, workflow engine or workflow enactment system) provides the necessary interfaces for coordination and communication among human and computational agents to execute the tasks involved in a workflow and controls the execution orderings of tasks as well as the flow of data that these tasks manipulate. That is, the workflow management system is responsible for correctly and reliably supporting the specification, execution, and monitoring of workflows. The six papers selected (out of the twenty-seven submitted for this special issue of Distributed Systems Engineering) address different aspects of these three functional components of a workflow management system. In the first paper, `Correctness issues in workflow management', Kamath and Ramamritham discuss the important issue of correctness in workflow management that constitutes a prerequisite for the use of workflows in the automation of the critical organizational/business processes. In particular, this paper examines the issues of execution atomicity and failure atomicity, differentiating between correctness requirements of system failures and logical failures, and surveys techniques that can be used to ensure data consistency in workflow management systems. While the first paper is concerned with correctness assuming transactional workflows in which selective transactional properties are associated with individual tasks or the entire workflow, the second paper, `Scheduling workflows by enforcing intertask dependencies' by Attie et al, assumes that the tasks can be either transactions or other activities involving legacy systems. This second paper describes the modelling and specification of conditions involving events and dependencies among tasks within a workflow using temporal logic and finite state automata. It also presents a scheduling algorithm that enforces all stated dependencies by executing at any given time only those events that are allowed by all the dependency automata and in an order as specified by the dependencies. In any system with decentralized control, there is a need to effectively cope with the tension that exists between autonomy and consistency requirements. In `A three-level atomicity model for decentralized workflow management systems', Ben-Shaul and Heineman focus on the specific requirement of enforcing failure atomicity in decentralized, autonomous and interacting workflow management systems. Their paper describes a model in which each workflow manager must be able to specify the sequence of tasks that comprise an atomic unit for the purposes of correctness, and the degrees of local and global atomicity for the purpose of cooperation with other workflow managers. The paper also discusses a realization of this model in which treaties and summits provide an agreement mechanism, while underlying transaction managers are responsible for maintaining failure atomicity. The fourth and fifth papers are experience papers describing a workflow management system and a large scale workflow application, respectively. Schill and Mittasch, in `Workflow management systems on top of OSF DCE and OMG CORBA', describe a decentralized workflow management system and discuss its implementation using two standardized middleware platforms, namely, OSF DCE and OMG CORBA. The system supports a new approach to workflow management, introducing several new concepts such as data type management for integrating various types of data and quality of service for various services provided by servers. A problem common to both database applications and workflows is the handling of missing and incomplete information. This is particularly pervasive in an `electronic market' with a huge number of retail outlets producing and exchanging volumes of data, the application discussed in `Information flow in the DAMA project beyond database managers: information flow managers'. Motivated by the need for a method that allows a task to proceed in a timely manner if not all data produced by other tasks are available by its deadline, Russell et al propose an architectural framework and a language that can be used to detect, approximate and, later on, to adjust missing data if necessary. The final paper, `The evolution towards flexible workflow systems' by Nutt, is complementary to the other papers and is a survey of issues and of work related to both workflow and computer supported collaborative work (CSCW) areas. In particular, the paper provides a model and a categorization of the dimensions which workflow management and CSCW systems share. Besides summarizing the recent advancements towards efficient workflow management, the papers in this special issue suggest areas open to investigation and it is our hope that they will also provide the stimulus for further research and development in the area of workflow management systems.
Brockhauser, Sandor; Svensson, Olof; Bowler, Matthew W; Nanao, Max; Gordon, Elspeth; Leal, Ricardo M F; Popov, Alexander; Gerring, Matthew; McCarthy, Andrew A; Gotz, Andy
2012-08-01
The automation of beam delivery, sample handling and data analysis, together with increasing photon flux, diminishing focal spot size and the appearance of fast-readout detectors on synchrotron beamlines, have changed the way that many macromolecular crystallography experiments are planned and executed. Screening for the best diffracting crystal, or even the best diffracting part of a selected crystal, has been enabled by the development of microfocus beams, precise goniometers and fast-readout detectors that all require rapid feedback from the initial processing of images in order to be effective. All of these advances require the coupling of data feedback to the experimental control system and depend on immediate online data-analysis results during the experiment. To facilitate this, a Data Analysis WorkBench (DAWB) for the flexible creation of complex automated protocols has been developed. Here, example workflows designed and implemented using DAWB are presented for enhanced multi-step crystal characterizations, experiments involving crystal reorientation with kappa goniometers, crystal-burning experiments for empirically determining the radiation sensitivity of a crystal system and the application of mesh scans to find the best location of a crystal to obtain the highest diffraction quality. Beamline users interact with the prepared workflows through a specific brick within the beamline-control GUI MXCuBE.
Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas
2016-01-01
Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564
NASA Astrophysics Data System (ADS)
Clempner, Julio B.
2017-01-01
This paper presents a novel analytical method for soundness verification of workflow nets and reset workflow nets, using the well-known stability results of Lyapunov for Petri nets. We also prove that the soundness property is decidable for workflow nets and reset workflow nets. In addition, we provide evidence of several outcomes related with properties such as boundedness, liveness, reversibility and blocking using stability. Our approach is validated theoretically and by a numerical example related to traffic signal-control synchronisation.
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.
Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu
2017-06-14
Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change
NASA Astrophysics Data System (ADS)
Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.
2015-12-01
The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.
Rapid analysis and exploration of fluorescence microscopy images.
Pavie, Benjamin; Rajaram, Satwik; Ouyang, Austin; Altschuler, Jason M; Steininger, Robert J; Wu, Lani F; Altschuler, Steven J
2014-03-19
Despite rapid advances in high-throughput microscopy, quantitative image-based assays still pose significant challenges. While a variety of specialized image analysis tools are available, most traditional image-analysis-based workflows have steep learning curves (for fine tuning of analysis parameters) and result in long turnaround times between imaging and analysis. In particular, cell segmentation, the process of identifying individual cells in an image, is a major bottleneck in this regard. Here we present an alternate, cell-segmentation-free workflow based on PhenoRipper, an open-source software platform designed for the rapid analysis and exploration of microscopy images. The pipeline presented here is optimized for immunofluorescence microscopy images of cell cultures and requires minimal user intervention. Within half an hour, PhenoRipper can analyze data from a typical 96-well experiment and generate image profiles. Users can then visually explore their data, perform quality control on their experiment, ensure response to perturbations and check reproducibility of replicates. This facilitates a rapid feedback cycle between analysis and experiment, which is crucial during assay optimization. This protocol is useful not just as a first pass analysis for quality control, but also may be used as an end-to-end solution, especially for screening. The workflow described here scales to large data sets such as those generated by high-throughput screens, and has been shown to group experimental conditions by phenotype accurately over a wide range of biological systems. The PhenoBrowser interface provides an intuitive framework to explore the phenotypic space and relate image properties to biological annotations. Taken together, the protocol described here will lower the barriers to adopting quantitative analysis of image based screens.
End-to-end workflow for finite element analysis of tumor treating fields in glioblastomas
NASA Astrophysics Data System (ADS)
Timmons, Joshua J.; Lok, Edwin; San, Pyay; Bui, Kevin; Wong, Eric T.
2017-11-01
Tumor Treating Fields (TTFields) therapy is an approved modality of treatment for glioblastoma. Patient anatomy-based finite element analysis (FEA) has the potential to reveal not only how these fields affect tumor control but also how to improve efficacy. While the automated tools for segmentation speed up the generation of FEA models, multi-step manual corrections are required, including removal of disconnected voxels, incorporation of unsegmented structures and the addition of 36 electrodes plus gel layers matching the TTFields transducers. Existing approaches are also not scalable for the high throughput analysis of large patient volumes. A semi-automated workflow was developed to prepare FEA models for TTFields mapping in the human brain. Magnetic resonance imaging (MRI) pre-processing, segmentation, electrode and gel placement, and post-processing were all automated. The material properties of each tissue were applied to their corresponding mask in silico using COMSOL Multiphysics (COMSOL, Burlington, MA, USA). The fidelity of the segmentations with and without post-processing was compared against the full semi-automated segmentation workflow approach using Dice coefficient analysis. The average relative differences for the electric fields generated by COMSOL were calculated in addition to observed differences in electric field-volume histograms. Furthermore, the mesh file formats in MPHTXT and NASTRAN were also compared using the differences in the electric field-volume histogram. The Dice coefficient was less for auto-segmentation without versus auto-segmentation with post-processing, indicating convergence on a manually corrected model. An existent but marginal relative difference of electric field maps from models with manual correction versus those without was identified, and a clear advantage of using the NASTRAN mesh file format was found. The software and workflow outlined in this article may be used to accelerate the investigation of TTFields in glioblastoma patients by facilitating the creation of FEA models derived from patient MRI datasets.
MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.
Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed
2017-01-20
Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
Rapid Energy Modeling Workflow Demonstration Project
2014-01-01
Conditioning Engineers BIM Building Information Model BLCC building life cycle costs BPA Building Performance Analysis CAD computer assisted...invited to enroll in the Autodesk Building Performance Analysis ( BPA ) Certificate Program under a group 30 specifically for DoD installation
Walsh, Kristin E; Chui, Michelle Anne; Kieser, Mara A; Williams, Staci M; Sutter, Susan L; Sutter, John G
2011-01-01
To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign.
Thermal Remote Sensing with Uav-Based Workflows
NASA Astrophysics Data System (ADS)
Boesch, R.
2017-08-01
Climate change will have a significant influence on vegetation health and growth. Predictions of higher mean summer temperatures and prolonged summer draughts may pose a threat to agriculture areas and forest canopies. Rising canopy temperatures can be an indicator of plant stress because of the closure of stomata and a decrease in the transpiration rate. Thermal cameras are available for decades, but still often used for single image analysis, only in oblique view manner or with visual evaluations of video sequences. Therefore remote sensing using a thermal camera can be an important data source to understand transpiration processes. Photogrammetric workflows allow to process thermal images similar to RGB data. But low spatial resolution of thermal cameras, significant optical distortion and typically low contrast require an adapted workflow. Temperature distribution in forest canopies is typically completely unknown and less distinct than for urban or industrial areas, where metal constructions and surfaces yield high contrast and sharp edge information. The aim of this paper is to investigate the influence of interior camera orientation, tie point matching and ground control points on the resulting accuracy of bundle adjustment and dense cloud generation with a typically used photogrammetric workflow for UAVbased thermal imagery in natural environments.
Managing the CMS Data and Monte Carlo Processing during LHC Run 2
NASA Astrophysics Data System (ADS)
Wissing, C.;
2017-10-01
In order to cope with the challenges expected during the LHC Run 2 CMS put in a number of enhancements into the main software packages and the tools used for centrally managed processing. In the presentation we will highlight these improvements that allow CMS to deal with the increased trigger output rate, the increased pileup and the evolution in computing technology. The overall system aims at high flexibility, improved operational flexibility and largely automated procedures. The tight coupling of workflow classes to types of sites has been drastically relaxed. Reliable and high-performing networking between most of the computing sites and the successful deployment of a data-federation allow the execution of workflows using remote data access. That required the development of a largely automatized system to assign workflows and to handle necessary pre-staging of data. Another step towards flexibility has been the introduction of one large global HTCondor Pool for all types of processing workflows and analysis jobs. Besides classical Grid resources also some opportunistic resources as well as Cloud resources have been integrated into that Pool, which gives reach to more than 200k CPU cores.
Generic worklist handler for workflow-enabled products
NASA Astrophysics Data System (ADS)
Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas
1999-07-01
Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.
Granja, Conceição; Janssen, Wouter; Johansen, Monika Alise
2018-05-01
eHealth has an enormous potential to improve healthcare cost, effectiveness, and quality of care. However, there seems to be a gap between the foreseen benefits of research and clinical reality. Our objective was to systematically review the factors influencing the outcome of eHealth interventions in terms of success and failure. We searched the PubMed database for original peer-reviewed studies on implemented eHealth tools that reported on the factors for the success or failure, or both, of the intervention. We conducted the systematic review by following the patient, intervention, comparison, and outcome framework, with 2 of the authors independently reviewing the abstract and full text of the articles. We collected data using standardized forms that reflected the categorization model used in the qualitative analysis of the outcomes reported in the included articles. Among the 903 identified articles, a total of 221 studies complied with the inclusion criteria. The studies were heterogeneous by country, type of eHealth intervention, method of implementation, and reporting perspectives. The article frequency analysis did not show a significant discrepancy between the number of reports on failure (392/844, 46.5%) and on success (452/844, 53.6%). The qualitative analysis identified 27 categories that represented the factors for success or failure of eHealth interventions. A quantitative analysis of the results revealed the category quality of healthcare (n=55) as the most mentioned as contributing to the success of eHealth interventions, and the category costs (n=42) as the most mentioned as contributing to failure. For the category with the highest unique article frequency, workflow (n=51), we conducted a full-text review. The analysis of the 23 articles that met the inclusion criteria identified 6 barriers related to workflow: workload (n=12), role definition (n=7), undermining of face-to-face communication (n=6), workflow disruption (n=6), alignment with clinical processes (n=2), and staff turnover (n=1). The reviewed literature suggested that, to increase the likelihood of success of eHealth interventions, future research must ensure a positive impact in the quality of care, with particular attention given to improved diagnosis, clinical management, and patient-centered care. There is a critical need to perform in-depth studies of the workflow(s) that the intervention will support and to perceive the clinical processes involved. ©Conceição Granja, Wouter Janssen, Monika Alise Johansen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 01.05.2018.
Janssen, Wouter; Johansen, Monika Alise
2018-01-01
Background eHealth has an enormous potential to improve healthcare cost, effectiveness, and quality of care. However, there seems to be a gap between the foreseen benefits of research and clinical reality. Objective Our objective was to systematically review the factors influencing the outcome of eHealth interventions in terms of success and failure. Methods We searched the PubMed database for original peer-reviewed studies on implemented eHealth tools that reported on the factors for the success or failure, or both, of the intervention. We conducted the systematic review by following the patient, intervention, comparison, and outcome framework, with 2 of the authors independently reviewing the abstract and full text of the articles. We collected data using standardized forms that reflected the categorization model used in the qualitative analysis of the outcomes reported in the included articles. Results Among the 903 identified articles, a total of 221 studies complied with the inclusion criteria. The studies were heterogeneous by country, type of eHealth intervention, method of implementation, and reporting perspectives. The article frequency analysis did not show a significant discrepancy between the number of reports on failure (392/844, 46.5%) and on success (452/844, 53.6%). The qualitative analysis identified 27 categories that represented the factors for success or failure of eHealth interventions. A quantitative analysis of the results revealed the category quality of healthcare (n=55) as the most mentioned as contributing to the success of eHealth interventions, and the category costs (n=42) as the most mentioned as contributing to failure. For the category with the highest unique article frequency, workflow (n=51), we conducted a full-text review. The analysis of the 23 articles that met the inclusion criteria identified 6 barriers related to workflow: workload (n=12), role definition (n=7), undermining of face-to-face communication (n=6), workflow disruption (n=6), alignment with clinical processes (n=2), and staff turnover (n=1). Conclusions The reviewed literature suggested that, to increase the likelihood of success of eHealth interventions, future research must ensure a positive impact in the quality of care, with particular attention given to improved diagnosis, clinical management, and patient-centered care. There is a critical need to perform in-depth studies of the workflow(s) that the intervention will support and to perceive the clinical processes involved. PMID:29716883
a Standardized Approach to Topographic Data Processing and Workflow Management
NASA Astrophysics Data System (ADS)
Wheaton, J. M.; Bailey, P.; Glenn, N. F.; Hensleigh, J.; Hudak, A. T.; Shrestha, R.; Spaete, L.
2013-12-01
An ever-increasing list of options exist for collecting high resolution topographic data, including airborne LIDAR, terrestrial laser scanners, bathymetric SONAR and structure-from-motion. An equally rich, arguably overwhelming, variety of tools exists with which to organize, quality control, filter, analyze and summarize these data. However, scientists are often left to cobble together their analysis as a series of ad hoc steps, often using custom scripts and one-time processes that are poorly documented and rarely shared with the community. Even when literature-cited software tools are used, the input and output parameters differ from tool to tool. These parameters are rarely archived and the steps performed lost, making the analysis virtually impossible to replicate precisely. What is missing is a coherent, robust, framework for combining reliable, well-documented topographic data-processing steps into a workflow that can be repeated and even shared with others. We have taken several popular topographic data processing tools - including point cloud filtering and decimation as well as DEM differencing - and defined a common protocol for passing inputs and outputs between them. This presentation describes a free, public online portal that enables scientists to create custom workflows for processing topographic data using a number of popular topographic processing tools. Users provide the inputs required for each tool and in what sequence they want to combine them. This information is then stored for future reuse (and optionally sharing with others) before the user then downloads a single package that contains all the input and output specifications together with the software tools themselves. The user then launches the included batch file that executes the workflow on their local computer against their topographic data. This ZCloudTools architecture helps standardize, automate and archive topographic data processing. It also represents a forum for discovering and sharing effective topographic processing workflows.
Satagopam, Venkata; Gu, Wei; Eifes, Serge; Gawron, Piotr; Ostaszewski, Marek; Gebel, Stephan; Barbosa-Silva, Adriano; Balling, Rudi; Schneider, Reinhard
2016-01-01
Abstract Translational medicine is a domain turning results of basic life science research into new tools and methods in a clinical environment, for example, as new diagnostics or therapies. Nowadays, the process of translation is supported by large amounts of heterogeneous data ranging from medical data to a whole range of -omics data. It is not only a great opportunity but also a great challenge, as translational medicine big data is difficult to integrate and analyze, and requires the involvement of biomedical experts for the data processing. We show here that visualization and interoperable workflows, combining multiple complex steps, can address at least parts of the challenge. In this article, we present an integrated workflow for exploring, analysis, and interpretation of translational medicine data in the context of human health. Three Web services—tranSMART, a Galaxy Server, and a MINERVA platform—are combined into one big data pipeline. Native visualization capabilities enable the biomedical experts to get a comprehensive overview and control over separate steps of the workflow. The capabilities of tranSMART enable a flexible filtering of multidimensional integrated data sets to create subsets suitable for downstream processing. A Galaxy Server offers visually aided construction of analytical pipelines, with the use of existing or custom components. A MINERVA platform supports the exploration of health and disease-related mechanisms in a contextualized analytical visualization system. We demonstrate the utility of our workflow by illustrating its subsequent steps using an existing data set, for which we propose a filtering scheme, an analytical pipeline, and a corresponding visualization of analytical results. The workflow is available as a sandbox environment, where readers can work with the described setup themselves. Overall, our work shows how visualization and interfacing of big data processing services facilitate exploration, analysis, and interpretation of translational medicine data. PMID:27441714
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework
Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A.; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E.; Hervey, W. Judson; McGowan, Thomas; Muth, Thilo; Rudney, Joel; Griffin, Timothy J.
2018-01-01
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest“ undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software. PMID:29385081
Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework.
Blank, Clemens; Easterly, Caleb; Gruening, Bjoern; Johnson, James; Kolmeder, Carolin A; Kumar, Praveen; May, Damon; Mehta, Subina; Mesuere, Bart; Brown, Zachary; Elias, Joshua E; Hervey, W Judson; McGowan, Thomas; Muth, Thilo; Nunn, Brook; Rudney, Joel; Tanca, Alessandro; Griffin, Timothy J; Jagtap, Pratik D
2018-01-31
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
Pohjonen, Hanna; Ross, Peeter; Blickman, Johan G; Kamman, Richard
2007-01-01
Emerging technologies are transforming the workflows in healthcare enterprises. Computing grids and handheld mobile/wireless devices are providing clinicians with enterprise-wide access to all patient data and analysis tools on a pervasive basis. In this paper, emerging technologies are presented that provide computing grids and streaming-based access to image and data management functions, and system architectures that enable pervasive computing on a cost-effective basis. Finally, the implications of such technologies are investigated regarding the positive impacts on clinical workflows.
A very simple, re-executable neuroimaging publication
Ghosh, Satrajit S.; Poline, Jean-Baptiste; Keator, David B.; Halchenko, Yaroslav O.; Thomas, Adam G.; Kessler, Daniel A.; Kennedy, David N.
2017-01-01
Reproducible research is a key element of the scientific process. Re-executability of neuroimaging workflows that lead to the conclusions arrived at in the literature has not yet been sufficiently addressed and adopted by the neuroimaging community. In this paper, we document a set of procedures, which include supplemental additions to a manuscript, that unambiguously define the data, workflow, execution environment and results of a neuroimaging analysis, in order to generate a verifiable re-executable publication. Re-executability provides a starting point for examination of the generalizability and reproducibility of a given finding. PMID:28781753
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...
2016-07-21
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Large-Scale Compute-Intensive Analysis via a Combined In-situ and Co-scheduling Workflow Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Sewell, Christopher; Heitmann, Katrin
2015-01-01
Large-scale simulations can produce tens of terabytes of data per analysis cycle, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in situ and co-scheduling approaches for handling Petabyte-size outputs. An initial inmore » situ step is used to reduce the amount of data to be analyzed, and to separate out the data-intensive tasks handled off-line. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.« less
Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K
2016-01-01
In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.
Text mining for the biocuration workflow.
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo
2011-07-20
Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Orton, Dennis J.; Doucette, Alan A.
2013-01-01
Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400
recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor
Collado-Torres, Leonardo; Nellore, Abhinav; Jaffe, Andrew E.
2017-01-01
The recount2 resource is composed of over 70,000 uniformly processed human RNA-seq samples spanning TCGA and SRA, including GTEx. The processed data can be accessed via the recount2 website and the recountBioconductor package. This workflow explains in detail how to use the recountpackage and how to integrate it with other Bioconductor packages for several analyses that can be carried out with the recount2 resource. In particular, we describe how the coverage count matrices were computed in recount2 as well as different ways of obtaining public metadata, which can facilitate downstream analyses. Step-by-step directions show how to do a gene-level differential expression analysis, visualize base-level genome coverage data, and perform an analyses at multiple feature levels. This workflow thus provides further information to understand the data in recount2 and a compendium of R code to use the data. PMID:29043067
A Two-Stage Probabilistic Approach to Manage Personal Worklist in Workflow Management Systems
NASA Astrophysics Data System (ADS)
Han, Rui; Liu, Yingbo; Wen, Lijie; Wang, Jianmin
The application of workflow scheduling in managing individual actor's personal worklist is one area that can bring great improvement to business process. However, current deterministic work cannot adapt to the dynamics and uncertainties in the management of personal worklist. For such an issue, this paper proposes a two-stage probabilistic approach which aims at assisting actors to flexibly manage their personal worklists. To be specific, the approach analyzes every activity instance's continuous probability of satisfying deadline at the first stage. Based on this stochastic analysis result, at the second stage, an innovative scheduling strategy is proposed to minimize the overall deadline violation cost for an actor's personal worklist. Simultaneously, the strategy recommends the actor a feasible worklist of activity instances which meet the required bottom line of successful execution. The effectiveness of our approach is evaluated in a real-world workflow management system and with large scale simulation experiments.
Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko
2016-06-01
Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer. Copyright © 2016 Elsevier Inc. All rights reserved.
Workflow in interventional radiology: uterine fibroid embolization (UFE)
NASA Astrophysics Data System (ADS)
Lindisch, David; Neumuth, Thomas; Burgert, Oliver; Spies, James; Cleary, Kevin
2008-03-01
Workflow analysis can be used to record the steps taken during clinical interventions with the goal of identifying bottlenecks and streamlining the procedure efficiency. In this study, we recorded the workflow for uterine fibroid embolization (UFE) procedures in the interventional radiology suite at Georgetown University Hospital in Washington, DC, USA. We employed a custom client/server software architecture developed by the Innovation Center for Computer Assisted Surgery (ICCAS) at the University of Leipzig, Germany. This software runs in a JAVA environment and enables an observer to record the actions taken by the physician and surgical team during these interventions. The data recorded is stored as an XML document, which can then be further processed. We recorded data from 30 patients and found a mean intervention time of 01:49:46 (+/- 16:04) minutes. The critical intervention step, the embolization, had a mean time of 00:15:42 (+/- 05:49) minutes, which was only 15% of the total intervention time.
SciDAC-Data, A Project to Enabling Data Driven Modeling of Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, M.; Ding, P.; Aliaga, L.
The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab Data Center on the organization, movement, and consumption of High Energy Physics data. The project will analyze the analysis patterns and data organization that have been used by the NOvA, MicroBooNE, MINERvA and other experiments, to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations aremore » designed to address questions of data handling, cache optimization and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership class exascale computing facilities. We will address the use of the SciDAC-Data distributions acquired from Fermilab Data Center’s analysis workflows and corresponding to around 71,000 HEP jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in HPC environments. In particular we describe in detail how the Sequential Access via Metadata (SAM) data handling system in combination with the dCache/Enstore based data archive facilities have been analyzed to develop the radically different models of the analysis of HEP data. We present how the simulation may be used to analyze the impact of design choices in archive facilities.« less
In situ and in-transit analysis of cosmological simulations
Friesen, Brian; Almgren, Ann; Lukic, Zarija; ...
2016-08-24
Modern cosmological simulations have reached the trillion-element scale, rendering data storage and subsequent analysis formidable tasks. To address this circumstance, we present a new MPI-parallel approach for analysis of simulation data while the simulation runs, as an alternative to the traditional workflow consisting of periodically saving large data sets to disk for subsequent ‘offline’ analysis. We demonstrate this approach in the compressible gasdynamics/N-body code Nyx, a hybrid MPI+OpenMP code based on the BoxLib framework, used for large-scale cosmological simulations. We have enabled on-the-fly workflows in two different ways: one is a straightforward approach consisting of all MPI processes periodically haltingmore » the main simulation and analyzing each component of data that they own (‘ in situ’). The other consists of partitioning processes into disjoint MPI groups, with one performing the simulation and periodically sending data to the other ‘sidecar’ group, which post-processes it while the simulation continues (‘in-transit’). The two groups execute their tasks asynchronously, stopping only to synchronize when a new set of simulation data needs to be analyzed. For both the in situ and in-transit approaches, we experiment with two different analysis suites with distinct performance behavior: one which finds dark matter halos in the simulation using merge trees to calculate the mass contained within iso-density contours, and another which calculates probability distribution functions and power spectra of various fields in the simulation. Both are common analysis tasks for cosmology, and both result in summary statistics significantly smaller than the original data set. We study the behavior of each type of analysis in each workflow in order to determine the optimal configuration for the different data analysis algorithms.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akhiat, A.; Elekta, Sunnyvale, CA; Kanis, A.P.
Purpose: To extend a clinical Record and Verify (R&V) system to enable a safe and fast workflow for Plan-of-the-Day (PotD) adaptive treatments based on patient-specific plan libraries. Methods: Plan libraries for PotD adaptive treatments contain for each patient several pre-treatment generated treatment plans. They may be generated for various patient anatomies or CTV-PTV margins. For each fraction, a Cone Beam CT scan is acquired to support the selection of the plan that best fits the patient’s anatomy-of-the-day. To date, there are no commercial R&V systems that support PotD delivery strategies. Consequently, the clinical workflow requires many manual interventions. Moreover, multiplemore » scheduled plans have a high risk of excessive dose delivery. In this work we extended a commercial R&V system (MOSAIQ) to support PotD workflows using IQ-scripting. The PotD workflow was designed after extensive risk analysis of the manual procedure, and all identified risks were incorporated as logical checks. Results: All manual PotD activities were automated. The workflow first identifies if the patient is scheduled for PotD, then performs safety checks, and continues to treatment plan selection only if no issues were found. The user selects the plan to deliver from a list of candidate plans. After plan selection, the workflow makes the treatment fields of the selected plan available for delivery by adding them to the treatment calendar. Finally, control is returned to the R&V system to commence treatment. Additional logic was added to incorporate off-line changes such as updating the plan library. After extensive testing including treatment fraction interrupts and plan-library updates during the treatment course, the workflow is running successfully in a clinical pilot, in which 35 patients have been treated since October 2014. Conclusion: We have extended a commercial R&V system for improved safety and efficiency in library-based adaptive strategies enabling a wide-spread implementation of those strategies. This work was in part funded by a research grant of Elekta AB, Stockholm, Sweden.« less
Patterson, Emily S.; Lowry, Svetlana Z.; Ramaiah, Mala; Gibbons, Michael C.; Brick, David; Calco, Robert; Matton, Greg; Miller, Anne; Makar, Ellen; Ferrer, Jorge A.
2015-01-01
Introduction: Human factors workflow analyses in healthcare settings prior to technology implemented are recommended to improve workflow in ambulatory care settings. In this paper we describe how insights from a workflow analysis conducted by NIST were implemented in a software prototype developed for a Veteran’s Health Administration (VHA) VAi2 innovation project and associated lessons learned. Methods: We organize the original recommendations and associated stages and steps visualized in process maps from NIST and the VA’s lessons learned from implementing the recommendations in the VAi2 prototype according to four stages: 1) before the patient visit, 2) during the visit, 3) discharge, and 4) visit documentation. NIST recommendations to improve workflow in ambulatory care (outpatient) settings and process map representations were based on reflective statements collected during one-hour discussions with three physicians. The development of the VAi2 prototype was conducted initially independently from the NIST recommendations, but at a midpoint in the process development, all of the implementation elements were compared with the NIST recommendations and lessons learned were documented. Findings: Story-based displays and templates with default preliminary order sets were used to support scheduling, time-critical notifications, drafting medication orders, and supporting a diagnosis-based workflow. These templates enabled customization to the level of diagnostic uncertainty. Functionality was designed to support cooperative work across interdisciplinary team members, including shared documentation sessions with tracking of text modifications, medication lists, and patient education features. Displays were customized to the role and included access for consultants and site-defined educator teams. Discussion: Workflow, usability, and patient safety can be enhanced through clinician-centered design of electronic health records. The lessons learned from implementing NIST recommendations to improve workflow in ambulatory care using an EHR provide a first step in moving from a billing-centered perspective on how to maintain accurate, comprehensive, and up-to-date information about a group of patients to a clinician-centered perspective. These recommendations point the way towards a “patient visit management system,” which incorporates broader notions of supporting workload management, supporting flexible flow of patients and tasks, enabling accountable distributed work across members of the clinical team, and supporting dynamic tracking of steps in tasks that have longer time distributions. PMID:26290887
Integrate Data into Scientific Workflows for Terrestrial Biosphere Model Evaluation through Brokers
NASA Astrophysics Data System (ADS)
Wei, Y.; Cook, R. B.; Du, F.; Dasgupta, A.; Poco, J.; Huntzinger, D. N.; Schwalm, C. R.; Boldrini, E.; Santoro, M.; Pearlman, J.; Pearlman, F.; Nativi, S.; Khalsa, S.
2013-12-01
Terrestrial biosphere models (TBMs) have become integral tools for extrapolating local observations and process-level understanding of land-atmosphere carbon exchange to larger regions. Model-model and model-observation intercomparisons are critical to understand the uncertainties within model outputs, to improve model skill, and to improve our understanding of land-atmosphere carbon exchange. The DataONE Exploration, Visualization, and Analysis (EVA) working group is evaluating TBMs using scientific workflows in UV-CDAT/VisTrails. This workflow-based approach promotes collaboration and improved tracking of evaluation provenance. But challenges still remain. The multi-scale and multi-discipline nature of TBMs makes it necessary to include diverse and distributed data resources in model evaluation. These include, among others, remote sensing data from NASA, flux tower observations from various organizations including DOE, and inventory data from US Forest Service. A key challenge is to make heterogeneous data from different organizations and disciplines discoverable and readily integrated for use in scientific workflows. This presentation introduces the brokering approach taken by the DataONE EVA to fill the gap between TBMs' evaluation scientific workflows and cross-organization and cross-discipline data resources. The DataONE EVA started the development of an Integrated Model Intercomparison Framework (IMIF) that leverages standards-based discovery and access brokers to dynamically discover, access, and transform (e.g. subset and resampling) diverse data products from DataONE, Earth System Grid (ESG), and other data repositories into a format that can be readily used by scientific workflows in UV-CDAT/VisTrails. The discovery and access brokers serve as an independent middleware that bridge existing data repositories and TBMs evaluation scientific workflows but introduce little overhead to either component. In the initial work, an OpenSearch-based discovery broker is leveraged to provide a consistent mechanism for data discovery. Standards-based data services, including Open Geospatial Consortium (OGC) Web Coverage Service (WCS) and THREDDS are leveraged to provide on-demand data access and transformations through the data access broker. To ease the adoption of broker services, a package of broker client VisTrails modules have been developed to be easily plugged into scientific workflows. The initial IMIF has been successfully tested in selected model evaluation scenarios involved in the NASA-funded Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP).
RISA: Remote Interface for Science Analysis
NASA Astrophysics Data System (ADS)
Gabriel, C.; Ibarra, A.; de La Calle, I.; Salgado, J.; Osuna, P.; Tapiador, D.
2008-08-01
The Scientific Analysis System (SAS) is the package for interactive and pipeline data reduction of all XMM-Newton data. Freely distributed by ESA to run under many different operating systems, the SAS has been used by almost every one of the 1600 refereed scientific publications obtained so far from the mission. We are developing RISA, the Remote Interface for Science Analysis, which makes it possible to run SAS through fully configurable web service workflows, enabling observers to access and analyse data making use of all of the existing SAS functionalities, without any installation/download of software/data. The workflows run primarily but not exclusively on the ESAC Grid, which offers scalable processing resources, directly connected to the XMM-Newton Science Archive. A first project internal version of RISA was issued in May 2007, a public release is expected already within this year.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2017-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Walsh, Kristin E.; Chui, Michelle Anne; Kieser, Mara A.; Williams, Staci M.; Sutter, Susan L.; Sutter, John G.
2012-01-01
Objective To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. Methods At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Results Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Conclusion Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign. PMID:21896459
Schmitz, Ulf; Lai, Xin; Winter, Felix; Wolkenhauer, Olaf; Vera, Julio; Gupta, Shailendra K.
2014-01-01
MicroRNAs (miRNAs) are an integral part of gene regulation at the post-transcriptional level. Recently, it has been shown that pairs of miRNAs can repress the translation of a target mRNA in a cooperative manner, which leads to an enhanced effectiveness and specificity in target repression. However, it remains unclear which miRNA pairs can synergize and which genes are target of cooperative miRNA regulation. In this paper, we present a computational workflow for the prediction and analysis of cooperating miRNAs and their mutual target genes, which we refer to as RNA triplexes. The workflow integrates methods of miRNA target prediction; triplex structure analysis; molecular dynamics simulations and mathematical modeling for a reliable prediction of functional RNA triplexes and target repression efficiency. In a case study we analyzed the human genome and identified several thousand targets of cooperative gene regulation. Our results suggest that miRNA cooperativity is a frequent mechanism for an enhanced target repression by pairs of miRNAs facilitating distinctive and fine-tuned target gene expression patterns. Human RNA triplexes predicted and characterized in this study are organized in a web resource at www.sbi.uni-rostock.de/triplexrna/. PMID:24875477
I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chard, Kyle; D'Arcy, Mike; Heavner, Benjamin D.
Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and toolsmore » for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.« less
The CARMEN software as a service infrastructure.
Weeks, Michael; Jessop, Mark; Fletcher, Martyn; Hodge, Victoria; Jackson, Tom; Austin, Jim
2013-01-28
The CARMEN platform allows neuroscientists to share data, metadata, services and workflows, and to execute these services and workflows remotely via a Web portal. This paper describes how we implemented a service-based infrastructure into the CARMEN Virtual Laboratory. A Software as a Service framework was developed to allow generic new and legacy code to be deployed as services on a heterogeneous execution framework. Users can submit analysis code typically written in Matlab, Python, C/C++ and R as non-interactive standalone command-line applications and wrap them as services in a form suitable for deployment on the platform. The CARMEN Service Builder tool enables neuroscientists to quickly wrap their analysis software for deployment to the CARMEN platform, as a service without knowledge of the service framework or the CARMEN system. A metadata schema describes each service in terms of both system and user requirements. The search functionality allows services to be quickly discovered from the many services available. Within the platform, services may be combined into more complicated analyses using the workflow tool. CARMEN and the service infrastructure are targeted towards the neuroscience community; however, it is a generic platform, and can be targeted towards any discipline.
Malta Barbosa, João; Tovar, Nick; A Tuesta, Pablo; Hirata, Ronaldo; Guimarães, Nuno; Romanini, José C; Moghadam, Marjan; Coelho, Paulo G; Jahangiri, Leila
2017-07-08
This work aims to present a pilot study of a non-destructive dental histo-anatomical analysis technique as well as to push the boundaries of the presently available restorative workflows for the fabrication of highly customized ceramic restorations. An extracted human maxillary central incisor was subject to a micro computed tomography scan and the acquired data was transferred into a workstation, reconstructed, segmented, evaluated and later imported into a Computer-Aided Design/Computer-Aided Manufacturing software for the fabrication of a ceramic resin-bonded prosthesis. The obtained prosthesis presented an encouraging optical behavior and was used clinically as final restoration. The digitally layered restorative replication of natural tooth morphology presents today as a clear possibility. New clinical and laboratory-fabricated, biologically inspired digital restorative protocols are to be expected in the near future. The digitally layered restorative replication of natural tooth morphology presents today as a clear possibility. This pilot study may represent a stimulus for future research and applications of digital imaging as well as digital restorative workflows in service of esthetic dentistry. © 2017 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Pan, Tianheng
2018-01-01
In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.
NASA Astrophysics Data System (ADS)
McCarthy, Ann
2006-01-01
The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.
Singh, Dadabhai T; Trehan, Rahul; Schmidt, Bertil; Bretschneider, Timo
2008-01-01
Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed--a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area. In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution. The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.
Epiviz: a view inside the design of an integrated visual analysis software for genomics
2015-01-01
Background Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community. PMID:26328750
A global sensitivity analysis approach for morphogenesis models.
Boas, Sonja E M; Navarro Jimenez, Maria I; Merks, Roeland M H; Blom, Joke G
2015-11-21
Morphogenesis is a developmental process in which cells organize into shapes and patterns. Complex, non-linear and multi-factorial models with images as output are commonly used to study morphogenesis. It is difficult to understand the relation between the uncertainty in the input and the output of such 'black-box' models, giving rise to the need for sensitivity analysis tools. In this paper, we introduce a workflow for a global sensitivity analysis approach to study the impact of single parameters and the interactions between them on the output of morphogenesis models. To demonstrate the workflow, we used a published, well-studied model of vascular morphogenesis. The parameters of this cellular Potts model (CPM) represent cell properties and behaviors that drive the mechanisms of angiogenic sprouting. The global sensitivity analysis correctly identified the dominant parameters in the model, consistent with previous studies. Additionally, the analysis provided information on the relative impact of single parameters and of interactions between them. This is very relevant because interactions of parameters impede the experimental verification of the predicted effect of single parameters. The parameter interactions, although of low impact, provided also new insights in the mechanisms of in silico sprouting. Finally, the analysis indicated that the model could be reduced by one parameter. We propose global sensitivity analysis as an alternative approach to study the mechanisms of morphogenesis. Comparison of the ranking of the impact of the model parameters to knowledge derived from experimental data and from manipulation experiments can help to falsify models and to find the operand mechanisms in morphogenesis. The workflow is applicable to all 'black-box' models, including high-throughput in vitro models in which output measures are affected by a set of experimental perturbations.
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.
Rohrer, Sebastian G; Baumann, Knut
2009-02-01
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
Radiology information system: a workflow-based approach.
Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P
2009-09-01
Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.
DOE Office of Scientific and Technical Information (OSTI.GOV)
JENNINGS, T.L.
The Work Flow analysis Report will be used to facilitate the requirements for implementing the Work Control module of Passport. The report consists of workflow integration processes for Work Management, Preventative Maintenance, Materials and Equipment
2011-02-01
Process Architecture Technology Analysis: Executive .............................................. 15 UIMA as Executive...44 A.4: Flow Code in UIMA ......................................................................................................... 46... UIMA ................................................................................................................................ 57 E.2
Cornett, Alex; Kuziemsky, Craig
2015-01-01
Implementing team based workflows can be complex because of the scope of providers involved and the extent of information exchange and communication that needs to occur. While a workflow may represent the ideal structure of communication that needs to occur, information issues and contextual factors may impact how the workflow is implemented in practice. Understanding these issues will help us better design systems to support team based workflows. In this paper we use a case study of palliative sedation therapy (PST) to model a PST workflow and then use it to identify purposes of communication, information issues and contextual factors that impact them. We then suggest how our findings could inform health information technology (HIT) design to support team based communication workflows.
FAST: A fully asynchronous and status-tracking pattern for geoprocessing services orchestration
NASA Astrophysics Data System (ADS)
Wu, Huayi; You, Lan; Gui, Zhipeng; Gao, Shuang; Li, Zhenqiang; Yu, Jingmin
2014-09-01
Geoprocessing service orchestration (GSO) provides a unified and flexible way to implement cross-application, long-lived, and multi-step geoprocessing service workflows by coordinating geoprocessing services collaboratively. Usually, geoprocessing services and geoprocessing service workflows are data and/or computing intensive. The intensity feature may make the execution process of a workflow time-consuming. Since it initials an execution request without blocking other interactions on the client side, an asynchronous mechanism is especially appropriate for GSO workflows. Many critical problems remain to be solved in existing asynchronous patterns for GSO including difficulties in improving performance, status tracking, and clarifying the workflow structure. These problems are a challenge when orchestrating performance efficiency, making statuses instantly available, and constructing clearly structured GSO workflows. A Fully Asynchronous and Status-Tracking (FAST) pattern that adopts asynchronous interactions throughout the whole communication tier of a workflow is proposed for GSO. The proposed FAST pattern includes a mechanism that actively pushes the latest status to clients instantly and economically. An independent proxy was designed to isolate the status tracking logic from the geoprocessing business logic, which assists the formation of a clear GSO workflow structure. A workflow was implemented in the FAST pattern to simulate the flooding process in the Poyang Lake region. Experimental results show that the proposed FAST pattern can efficiently tackle data/computing intensive geoprocessing tasks. The performance of all collaborative partners was improved due to the asynchronous mechanism throughout communication tier. A status-tracking mechanism helps users retrieve the latest running status of a GSO workflow in an efficient and instant way. The clear structure of the GSO workflow lowers the barriers for geospatial domain experts and model designers to compose asynchronous GSO workflows. Most importantly, it provides better support for locating and diagnosing potential exceptions.
The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data.
The complete digital workflow in fixed prosthodontics: a systematic review.
Joda, Tim; Zarone, Fernando; Ferrari, Marco
2017-09-19
The continuous development in dental processing ensures new opportunities in the field of fixed prosthodontics in a complete virtual environment without any physical model situations. The aim was to compare fully digitalized workflows to conventional and/or mixed analog-digital workflows for the treatment with tooth-borne or implant-supported fixed reconstructions. A PICO strategy was executed using an electronic (MEDLINE, EMBASE, Google Scholar) plus manual search up to 2016-09-16 focusing on RCTs investigating complete digital workflows in fixed prosthodontics with regard to economics or esthetics or patient-centered outcomes with or without follow-up or survival/success rate analysis as well as complication assessment of at least 1 year under function. The search strategy was assembled from MeSH-Terms and unspecific free-text words: {(("Dental Prosthesis" [MeSH]) OR ("Crowns" [MeSH]) OR ("Dental Prosthesis, Implant-Supported" [MeSH])) OR ((crown) OR (fixed dental prosthesis) OR (fixed reconstruction) OR (dental bridge) OR (implant crown) OR (implant prosthesis) OR (implant restoration) OR (implant reconstruction))} AND {("Computer-Aided Design" [MeSH]) OR ((digital workflow) OR (digital technology) OR (computerized dentistry) OR (intraoral scan) OR (digital impression) OR (scanbody) OR (virtual design) OR (digital design) OR (cad/cam) OR (rapid prototyping) OR (monolithic) OR (full-contour))} AND {("Dental Technology" [MeSH) OR ((conventional workflow) OR (lost-wax-technique) OR (porcelain-fused-to-metal) OR (PFM) OR (implant impression) OR (hand-layering) OR (veneering) OR (framework))} AND {(("Study, Feasibility" [MeSH]) OR ("Survival" [MeSH]) OR ("Success" [MeSH]) OR ("Economics" [MeSH]) OR ("Costs, Cost Analysis" [MeSH]) OR ("Esthetics, Dental" [MeSH]) OR ("Patient Satisfaction" [MeSH])) OR ((feasibility) OR (efficiency) OR (patient-centered outcome))}. Assessment of risk of bias in selected studies was done at a 'trial level' including random sequence generation, allocation concealment, blinding, completeness of outcome data, selective reporting, and other bias using the Cochrane Collaboration tool. A judgment of risk of bias was assigned if one or more key domains had a high or unclear risk of bias. An official registration of the systematic review was not performed. The systematic search identified 67 titles, 32 abstracts thereof were screened, and subsequently, three full-texts included for data extraction. Analysed RCTs were heterogeneous without follow-up. One study demonstrated that fully digitally produced dental crowns revealed the feasibility of the process itself; however, the marginal precision was lower for lithium disilicate (LS2) restorations (113.8 μm) compared to conventional metal-ceramic (92.4 μm) and zirconium dioxide (ZrO2) crowns (68.5 μm) (p < 0.05). Another study showed that leucite-reinforced glass ceramic crowns were esthetically favoured by the patients (8/2 crowns) and clinicians (7/3 crowns) (p < 0.05). The third study investigated implant crowns. The complete digital workflow was more than twofold faster (75.3 min) in comparison to the mixed analog-digital workflow (156.6 min) (p < 0.05). No RCTs could be found investigating multi-unit fixed dental prostheses (FDP). The number of RCTs testing complete digital workflows in fixed prosthodontics is low. Scientifically proven recommendations for clinical routine cannot be given at this time. Research with high-quality trials seems to be slower than the industrial progress of available digital applications. Future research with well-designed RCTs including follow-up observation is compellingly necessary in the field of complete digital processing.
The future of scientific workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
NASA Astrophysics Data System (ADS)
Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent
2014-03-01
Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.
The standard-based open workflow system in GeoBrain (Invited)
NASA Astrophysics Data System (ADS)
Di, L.; Yu, G.; Zhao, P.; Deng, M.
2013-12-01
GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial intelligence technology. The GeoBrain workflow system has been used in multiple Earth science applications, including the monitoring of global agricultural drought, the assessment of flood damage, the derivation of national crop condition and progress information, and the detection of nuclear proliferation facilities and events.
Steger, Julia; Arnhard, Kathrin; Haslacher, Sandra; Geiger, Klemens; Singer, Klaus; Schlapp, Michael; Pitterl, Florian; Oberacher, Herbert
2016-04-01
Forensic toxicology and environmental water analysis share the common interest and responsibility in ensuring comprehensive and reliable confirmation of drugs and pharmaceutical compounds in samples analyzed. Dealing with similar analytes, detection and identification techniques should be exchangeable between scientific disciplines. Herein, we demonstrate the successful adaption of a forensic toxicological screening workflow employing nontargeted LC/MS/MS under data-dependent acquisition control and subsequent database search to water analysis. The main modification involved processing of an increased sample volume with SPE (500 mL vs. 1-10 mL) to reach LODs in the low ng/L range. Tandem mass spectra acquired with a qTOF instrument were submitted to database search. The targeted data mining strategy was found to be sensitive and specific; automated search produced hardly any false results. To demonstrate the applicability of the adapted workflow to complex samples, 14 wastewater effluent samples collected on seven consecutive days at the local wastewater-treatment plant were analyzed. Of the 88,970 fragment ion mass spectra produced, 8.8% of spectra were successfully assigned to one of the 1040 reference compounds included in the database, and this enabled the identification of 51 compounds representing important illegal drugs, members of various pharmaceutical compound classes, and metabolites thereof. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization
Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...
2015-01-01
This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less
Christ, Roxie; Guevar, Julien; Poyade, Matthieu; Rea, Paul M
2018-01-01
Neuroanatomy can be challenging to both teach and learn within the undergraduate veterinary medicine and surgery curriculum. Traditional techniques have been used for many years, but there has now been a progression to move towards alternative digital models and interactive 3D models to engage the learner. However, digital innovations in the curriculum have typically involved the medical curriculum rather than the veterinary curriculum. Therefore, we aimed to create a simple workflow methodology to highlight the simplicity there is in creating a mobile augmented reality application of basic canine head anatomy. Using canine CT and MRI scans and widely available software programs, we demonstrate how to create an interactive model of head anatomy. This was applied to augmented reality for a popular Android mobile device to demonstrate the user-friendly interface. Here we present the processes, challenges and resolutions for the creation of a highly accurate, data based anatomical model that could potentially be used in the veterinary curriculum. This proof of concept study provides an excellent framework for the creation of augmented reality training products for veterinary education. The lack of similar resources within this field provides the ideal platform to extend this into other areas of veterinary education and beyond.
Christ, Roxie; Guevar, Julien; Poyade, Matthieu
2018-01-01
Neuroanatomy can be challenging to both teach and learn within the undergraduate veterinary medicine and surgery curriculum. Traditional techniques have been used for many years, but there has now been a progression to move towards alternative digital models and interactive 3D models to engage the learner. However, digital innovations in the curriculum have typically involved the medical curriculum rather than the veterinary curriculum. Therefore, we aimed to create a simple workflow methodology to highlight the simplicity there is in creating a mobile augmented reality application of basic canine head anatomy. Using canine CT and MRI scans and widely available software programs, we demonstrate how to create an interactive model of head anatomy. This was applied to augmented reality for a popular Android mobile device to demonstrate the user-friendly interface. Here we present the processes, challenges and resolutions for the creation of a highly accurate, data based anatomical model that could potentially be used in the veterinary curriculum. This proof of concept study provides an excellent framework for the creation of augmented reality training products for veterinary education. The lack of similar resources within this field provides the ideal platform to extend this into other areas of veterinary education and beyond. PMID:29698413
Schweitzer, M; Lasierra, N; Hoerbst, A
2015-01-01
Increasing the flexibility from a user-perspective and enabling a workflow based interaction, facilitates an easy user-friendly utilization of EHRs for healthcare professionals' daily work. To offer such versatile EHR-functionality, our approach is based on the execution of clinical workflows by means of a composition of semantic web-services. The backbone of such architecture is an ontology which enables to represent clinical workflows and facilitates the selection of suitable services. In this paper we present the methods and results after running observations of diabetes routine consultations which were conducted in order to identify those workflows and the relation among the included tasks. Mentioned workflows were first modeled by BPMN and then generalized. As a following step in our study, interviews will be conducted with clinical personnel to validate modeled workflows.
Human Systems Integration (HSI) Associated Development Activities in Japan
2008-06-12
machine learning and data mining methods. The continuous effort ( KAIZEN ) to improve the analysis phases are illustrated in Figure 14. Although there...model Extraction of a workflow Extraction of a control rule Variation analysis and improvement Plant operation KAIZEN Fig. 14
CMS Data Processing Workflows during an Extended Cosmic Ray Run
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2009-11-01
The CMS Collaboration conducted a month-long data taking exercise, the Cosmic Run At Four Tesla, during October-November 2008, with the goal of commissioning the experiment for extended operation. With all installed detector systems participating, CMS recorded 270 million cosmic ray events with the solenoid at a magnetic field strength of 3.8 T. This paper describes the data flow from the detector through the various online and offline computing systems, as well as the workflows used for recording the data, for aligning and calibrating the detector, and for analysis of the data.
Monitoring of computing resource use of active software releases at ATLAS
NASA Astrophysics Data System (ADS)
Limosani, Antonio; ATLAS Collaboration
2017-10-01
The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows.
Maserat, Elham; Seied Farajollah, Seiede Sedigheh; Safdari, Reza; Ghazisaeedi, Marjan; Aghdaei, Hamid Asadzadeh; Zali, Mohammad Reza
2015-01-01
Colorectal cancer is a major cause of morbidity and mortality throughout the world. Colorectal cancer screening is an optimal way for reducing of morbidity and mortality and a clinical decision support system (CDSS) plays an important role in predicting success of screening processes. DSS is a computer-based information system that improves the delivery of preventive care services. The aim of this article was to detail engineering of information requirements and work flow design of CDSS for a colorectal cancer screening program. In the first stage a screening minimum data set was determined. Developed and developing countries were analyzed for identifying this data set. Then information deficiencies and gaps were determined by check list. The second stage was a qualitative survey with a semi-structured interview as the study tool. A total of 15 users and stakeholders' perspectives about workflow of CDSS were studied. Finally workflow of DSS of control program was designed by standard clinical practice guidelines and perspectives. Screening minimum data set of national colorectal cancer screening program was defined in five sections, including colonoscopy data set, surgery, pathology, genetics and pedigree data set. Deficiencies and information gaps were analyzed. Then we designed a work process standard of screening. Finally workflow of DSS and entry stage were determined. A CDSS facilitates complex decision making for screening and has key roles in designing optimal interactions between colonoscopy, pathology and laboratory departments. Also workflow analysis is useful to identify data reconciliation strategies to address documentation gaps. Following recommendations of CDSS should improve quality of colorectal cancer screening.
NASA Astrophysics Data System (ADS)
Vallet, A.; Bertrand, C.; Fabbri, O.; Mudry, J.
2015-01-01
Pore water pressure build-up by recharge of underground hydrosystems is one of the main triggering factors of deep-seated landslides. In most deep-seated landslides, pore water pressure data are not available since piezometers, if any, have a very short lifespan because of slope movements. As a consequence, indirect parameters, such as the calculated recharge, are the only data which enable understanding landslide hydrodynamic behaviour. However, in landslide studies, methods and recharge-area parameters used to determine the groundwater recharge are rarely detailed. In this study, the groundwater recharge is estimated with a soil-water balance based on characterisation of evapotranspiration and parameters characterising the recharge area (soil available water capacity, runoff and vegetation coefficient). A workflow to compute daily groundwater recharge is developed. This workflow requires the records of precipitation, air temperature, relative humidity, solar radiation and wind speed within or close to the landslide area. The determination of the parameters of the recharge area is based on a spatial analysis requiring field observations and spatial data sets (digital elevation models, aerial photographs and geological maps). This study demonstrates that the performance of the correlation with landslide displacement velocity data is significantly improved using the recharge estimated with the proposed workflow. The coefficient of determination obtained with the recharge estimated with the proposed workflow is 78% higher on average than that obtained with precipitation, and is 38% higher on average than that obtained with recharge computed with a commonly used simplification in landslide studies (recharge = precipitation minus non-calibrated evapotranspiration method).
Winkler, Robert
2015-01-01
In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.
Razavi, Morteza; Leigh Anderson, N; Pope, Matthew E; Yip, Richard; Pearson, Terry W
2016-09-25
Efficient robotic workflows for trypsin digestion of human plasma and subsequent antibody-mediated peptide enrichment (the SISCAPA method) were developed with the goal of improving assay precision and throughput for multiplexed protein biomarker quantification. First, an 'addition only' tryptic digestion protocol was simplified from classical methods, eliminating the need for sample cleanup, while improving reproducibility, scalability and cost. Second, methods were developed to allow multiplexed enrichment and quantification of peptide surrogates of protein biomarkers representing a very broad range of concentrations and widely different molecular masses in human plasma. The total workflow coefficients of variation (including the 3 sequential steps of digestion, SISCAPA peptide enrichment and mass spectrometric analysis) for 5 proteotypic peptides measured in 6 replicates of each of 6 different samples repeated over 6 days averaged 3.4% within-run and 4.3% across all runs. An experiment to identify sources of variation in the workflow demonstrated that MRM measurement and tryptic digestion steps each had average CVs of ∼2.7%. Because of the high purity of the peptide analytes enriched by antibody capture, the liquid chromatography step is minimized and in some cases eliminated altogether, enabling throughput levels consistent with requirements of large biomarker and clinical studies. Copyright © 2016 Elsevier B.V. All rights reserved.
Mahieu, Nathaniel G.; Spalding, Jonathan L.; Patti, Gary J.
2016-01-01
Motivation: Current informatic techniques for processing raw chromatography/mass spectrometry data break down under several common, non-ideal conditions. Importantly, hydrophilic liquid interaction chromatography (a key separation technology for metabolomics) produces data which are especially challenging to process. We identify three critical points of failure in current informatic workflows: compound specific drift, integration region variance, and naive missing value imputation. We implement the Warpgroup algorithm to address these challenges. Results: Warpgroup adds peak subregion detection, consensus integration bound detection, and intelligent missing value imputation steps to the conventional informatic workflow. When compared with the conventional workflow, Warpgroup made major improvements to the processed data. The coefficient of variation for peaks detected in replicate injections of a complex Escherichia Coli extract were halved (a reduction of 19%). Integration regions across samples were much more robust. Additionally, many signals lost by the conventional workflow were ‘rescued’ by the Warpgroup refinement, thereby resulting in greater analyte coverage in the processed data. Availability and implementation: Warpgroup is an open source R package available on GitHub at github.com/nathaniel-mahieu/warpgroup. The package includes example data and XCMS compatibility wrappers for ease of use. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: nathaniel.mahieu@wustl.edu or gjpattij@wustl.edu PMID:26424859
A workflow to investigate exposure and pharmacokinetic ...
Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption, distribution, metabolism, excretion (ADME) properties of chemicals. We developed a conceptual workflow to consider exposure and ADME properties in relationship to an MIE and demonstrated the utility of this workflow using a previously established AOP, acetylcholinesterase (AChE) inhibition. Thirty active chemicals found to inhibit AChE in the ToxCastTM assay were examined with respect to their exposure and absorption potentials, and their ability to cross the blood-brain barrier. Structural similarities of active compounds were compared against structures of inactive compounds to detect possible non-active parents that might have active metabolites. Fifty-two of the 1,029 inactive compounds exhibited a similarity threshold above 75% with their nearest active neighbors. Excluding compounds that may not be absorbed, 22 could be potentially toxic following metabolism. The incorporation of exposure and ADME properties into the conceptual workflow resulted in prioritization of 20 out of 30 active compounds identified in an AChE inhibition assay for further analysis, along with identification of several inactive parent compounds of active metabolites. This qualitative approach can minimize co
Business process re-engineering a cardiology department.
Bakshi, Syed Murtuza Hussain
2014-01-01
The health care sector is the world's third largest industry and is facing several problems such as excessive waiting times for patients, lack of access to information, high costs of delivery and medical errors. Health care managers seek the help of process re-engineering methods to discover the best processes and to re-engineer existing processes to optimize productivity without compromising on quality. Business process re-engineering refers to the fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical, contemporary measures of performance, such as cost, quality and speed. The present study is carried out at a tertiary care corporate hospital with 1000-plus-bed facility. A descriptive study and case study method is used with intensive, careful and complete observation of patient flow, delays, short comings in patient movement and workflow. Data is collected through observations, informal interviews and analyzed by matrix analysis. Flowcharts were drawn for the various work activities of the cardiology department including workflow of the admission process, workflow in the ward and ICCU, workflow of the patient for catheterization laboratory procedure, and in the billing and discharge process. The problems of the existing system were studied and necessary suggestions were recommended to cardiology department module with an illustrated flowchart.
The BioExtract Server: a web-based bioinformatic workflow platform
Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.
2011-01-01
The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Using R in Taverna: RShell v1.2
Wassink, Ingo; Rauwerda, Han; Neerincx, Pieter BT; Vet, Paul E van der; Breit, Timo M; Leunissen, Jack AM; Nijholt, Anton
2009-01-01
Background R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical. Findings We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly. Conclusion Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way. PMID:19607662
Emergency Medicine Resident Physicians’ Perceptions of Electronic Documentation and Workflow
Neri, P.M.; Redden, L.; Poole, S.; Pozner, C.N.; Horsky, J.; Raja, A.S.; Poon, E.; Schiff, G.
2015-01-01
Summary Objective To understand emergency department (ED) physicians’ use of electronic documentation in order to identify usability and workflow considerations for the design of future ED information system (EDIS) physician documentation modules. Methods We invited emergency medicine resident physicians to participate in a mixed methods study using task analysis and qualitative interviews. Participants completed a simulated, standardized patient encounter in a medical simulation center while documenting in the test environment of a currently used EDIS. We recorded the time on task, type and sequence of tasks performed by the participants (including tasks performed in parallel). We then conducted semi-structured interviews with each participant. We analyzed these qualitative data using the constant comparative method to generate themes. Results Eight resident physicians participated. The simulation session averaged 17 minutes and participants spent 11 minutes on average on tasks that included electronic documentation. Participants performed tasks in parallel, such as history taking and electronic documentation. Five of the 8 participants performed a similar workflow sequence during the first part of the session while the remaining three used different workflows. Three themes characterize electronic documentation: (1) physicians report that location and timing of documentation varies based on patient acuity and workload, (2) physicians report a need for features that support improved efficiency; and (3) physicians like viewing available patient data but struggle with integration of the EDIS with other information sources. Conclusion We confirmed that physicians spend much of their time on documentation (65%) during an ED patient visit. Further, we found that resident physicians did not all use the same workflow and approach even when presented with an identical standardized patient scenario. Future EHR design should consider these varied workflows while trying to optimize efficiency, such as improving integration of clinical data. These findings should be tested quantitatively in a larger, representative study. PMID:25848411
Esdar, Moritz; Hübner, Ursula; Liebe, Jan-David; Hüsers, Jens; Thye, Johannes
2017-01-01
Clinical information logistics is a construct that aims to describe and explain various phenomena of information provision to drive clinical processes. It can be measured by the workflow composite score, an aggregated indicator of the degree of IT support in clinical processes. This study primarily aimed to investigate the yet unknown empirical patterns constituting this construct. The second goal was to derive a data-driven weighting scheme for the constituents of the workflow composite score and to contrast this scheme with a literature based, top-down procedure. This approach should finally test the validity and robustness of the workflow composite score. Based on secondary data from 183 German hospitals, a tiered factor analytic approach (confirmatory and subsequent exploratory factor analysis) was pursued. A weighting scheme, which was based on factor loadings obtained in the analyses, was put into practice. We were able to identify five statistically significant factors of clinical information logistics that accounted for 63% of the overall variance. These factors were "flow of data and information", "mobility", "clinical decision support and patient safety", "electronic patient record" and "integration and distribution". The system of weights derived from the factor loadings resulted in values for the workflow composite score that differed only slightly from the score values that had been previously published based on a top-down approach. Our findings give insight into the internal composition of clinical information logistics both in terms of factors and weights. They also allowed us to propose a coherent model of clinical information logistics from a technical perspective that joins empirical findings with theoretical knowledge. Despite the new scheme of weights applied to the calculation of the workflow composite score, the score behaved robustly, which is yet another hint of its validity and therefore its usefulness. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Thanki, Anil S; Soranzo, Nicola; Haerty, Wilfried; Davey, Robert P
2018-03-01
Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.
NASA Astrophysics Data System (ADS)
Clements, Oliver; Walker, Peter
2014-05-01
The cost of working with extremely large data sets is an increasingly important issue within the Earth Observation community. From global coverage data at any resolution to small coverage data at extremely high resolution, the community has always produced big data. This will only increase as new sensors are deployed and their data made available. Over time standard workflows have emerged. These have been facilitated by the production and adoption of standard technologies. Groups such as the International Organisation for Standardisation (ISO) and the Open Geospatial Consortium (OGC) have been a driving force in this area for many years. The production of standard protocols and interfaces such as OPeNDAP, Web Coverage Service (WCS), Web Processing Service (WPS) and the newer emerging standards such as Web Coverage Processing Service (WCPS) have helped to galvanise these workflows. An example of a traditional workflow, assume a researcher wants to assess the temporal trend in chlorophyll concentration. This would involve a discovery phase, an acquisition phase, a processing phase and finally a derived product or analysis phase. Each element of this workflow has an associated temporal and monetary cost. Firstly the researcher would require a high bandwidth connection or the acquisition phase would take too long. Secondly the researcher must have their own expensive equipment for use in the processing phase. Both of these elements cost money and time. This can make the whole process prohibitive to scientists from the developing world or "citizen scientists" that do not have the processing infrastructure necessary. The use of emerging technologies can help improve both the monetary and time costs associated with these existing workflows. By utilising a WPS that is hosted at the same location as the data a user is able to apply processing to the data without needing their own processing infrastructure. This however limits the user to predefined processes that are made available by the data provider. The emerging OGC WCPS standard combined with big data analytics engines may provide a mechanism to improve this situation. The technology allows users to create their own queries using an SQL like query language and apply them over available large data archive, once again at the data providers end. This not only removes the processing cost whilst still allowing user defined processes it also reduces the bandwidth required, as only the final analysis or derived product needs to be downloaded. The maturity of the new technologies is a stage where their use should be justified by a quantitative assessment rather than simply by the fact that they are new developments. We will present a study of the time and cost requirements for a selection of existing workflows and then show how new/emerging standards and technologies can help to both reduce the cost to the user by shifting processing to the data, and reducing the required bandwidth for analysing large datasets, making analysis of big-data archives possible for a greater and more diverse audience.
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; ...
2017-11-23
Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
NASA Astrophysics Data System (ADS)
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; Tsaris, Aristeidis; Norman, Andrew; Lyon, Adam; Ross, Robert
2017-10-01
The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.
Lin, Chi-Hung; Krisp, Christoph; Packer, Nicolle H; Molloy, Mark P
2018-02-10
Glycoproteomics investigates glycan moieties in a site specific manner to reveal the functional roles of protein glycosylation. Identification of glycopeptides from data-dependent acquisition (DDA) relies on high quality MS/MS spectra of glycopeptide precursors and often requires manual validation to ensure confident assignments. In this study, we investigated pseudo-MRM (MRM-HR) and data-independent acquisition (DIA) as alternative acquisition strategies for glycopeptide analysis. These approaches allow data acquisition over the full MS/MS scan range allowing data re-analysis post-acquisition, without data re-acquisition. The advantage of MRM-HR over DDA for N-glycopeptide detection was demonstrated from targeted analysis of bovine fetuin where all three N-glycosylation sites were detected, which was not the case with DDA. To overcome the duty cycle limitation of MRM-HR acquisition needed for analysis of complex samples such as plasma we trialed DIA. This allowed development of a targeted DIA method to identify N-glycopeptides without pre-defined knowledge of the glycan composition, thus providing the potential to identify N-glycopeptides with unexpected structures. This workflow was demonstrated by detection of 59 N-glycosylation sites from 41 glycoproteins from a HILIC enriched human plasma tryptic digest. 21 glycoforms of IgG1 glycopeptides were identified including two truncated structures that are rarely reported. We developed a data-independent mass spectrometry workflow to identify specific glycopeptides from complex biological mixtures. The novelty is that this approach does not require glycan composition to be pre-defined, thereby allowing glycopeptides carrying unexpected glycans to be identified. This is demonstrated through the analysis of immunoglobulins in human plasma where we detected two IgG1 glycoforms that are rarely observed. Copyright © 2017 Elsevier B.V. All rights reserved.
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo
Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less
Design and implementation of workflow engine for service-oriented architecture
NASA Astrophysics Data System (ADS)
Peng, Shuqing; Duan, Huining; Chen, Deyun
2009-04-01
As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
The Spinel Explorer--Interactive Visual Analysis of Spinel Group Minerals.
Luján Ganuza, María; Ferracutti, Gabriela; Gargiulo, María Florencia; Castro, Silvia Mabel; Bjerg, Ernesto; Gröller, Eduard; Matković, Krešimir
2014-12-01
Geologists usually deal with rocks that are up to several thousand million years old. They try to reconstruct the tectonic settings where these rocks were formed and the history of events that affected them through the geological time. The spinel group minerals provide useful information regarding the geological environment in which the host rocks were formed. They constitute excellent indicators of geological environments (tectonic settings) and are of invaluable help in the search for mineral deposits of economic interest. The current workflow requires the scientists to work with different applications to analyze spine data. They do use specific diagrams, but these are usually not interactive. The current workflow hinders domain experts to fully exploit the potentials of tediously and expensively collected data. In this paper, we introduce the Spinel Explorer-an interactive visual analysis application for spinel group minerals. The design of the Spinel Explorer and of the newly introduced interactions is a result of a careful study of geologists' tasks. The Spinel Explorer includes most of the diagrams commonly used for analyzing spinel group minerals, including 2D binary plots, ternary plots, and 3D Spinel prism plots. Besides specific plots, conventional information visualization views are also integrated in the Spinel Explorer. All views are interactive and linked. The Spinel Explorer supports conventional statistics commonly used in spinel minerals exploration. The statistics views and different data derivation techniques are fully integrated in the system. Besides the Spinel Explorer as newly proposed interactive exploration system, we also describe the identified analysis tasks, and propose a new workflow. We evaluate the Spinel Explorer using real-life data from two locations in Argentina: the Frontal Cordillera in Central Andes and Patagonia. We describe the new findings of the geologists which would have been much more difficult to achieve using the current workflow only. Very positive feedback from geologists confirms the usefulness of the Spinel Explorer.
Laboratory Workflow Analysis of Culture of Periprosthetic Tissues in Blood Culture Bottles.
Peel, Trisha N; Sedarski, John A; Dylla, Brenda L; Shannon, Samantha K; Amirahmadi, Fazlollaah; Hughes, John G; Cheng, Allen C; Patel, Robin
2017-09-01
Culture of periprosthetic tissue specimens in blood culture bottles is more sensitive than conventional techniques, but the impact on laboratory workflow has yet to be addressed. Herein, we examined the impact of culture of periprosthetic tissues in blood culture bottles on laboratory workflow and cost. The workflow was process mapped, decision tree models were constructed using probabilities of positive and negative cultures drawn from our published study (T. N. Peel, B. L. Dylla, J. G. Hughes, D. T. Lynch, K. E. Greenwood-Quaintance, A. C. Cheng, J. N. Mandrekar, and R. Patel, mBio 7:e01776-15, 2016, https://doi.org/10.1128/mBio.01776-15), and the processing times and resource costs from the laboratory staff time viewpoint were used to compare periprosthetic tissues culture processes using conventional techniques with culture in blood culture bottles. Sensitivity analysis was performed using various rates of positive cultures. Annualized labor savings were estimated based on salary costs from the U.S. Labor Bureau for Laboratory staff. The model demonstrated a 60.1% reduction in mean total staff time with the adoption of tissue inoculation into blood culture bottles compared to conventional techniques (mean ± standard deviation, 30.7 ± 27.6 versus 77.0 ± 35.3 h per month, respectively; P < 0.001). The estimated annualized labor cost savings of culture using blood culture bottles was $10,876.83 (±$337.16). Sensitivity analysis was performed using various rates of culture positivity (5 to 50%). Culture in blood culture bottles was cost-effective, based on the estimated labor cost savings of $2,132.71 for each percent increase in test accuracy. In conclusion, culture of periprosthetic tissue in blood culture bottles is not only more accurate than but is also cost-saving compared to conventional culture methods. Copyright © 2017 American Society for Microbiology.
A three-level atomicity model for decentralized workflow management systems
NASA Astrophysics Data System (ADS)
Ben-Shaul, Israel Z.; Heineman, George T.
1996-12-01
A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.
A Tool Supporting Collaborative Data Analytics Workflow Design and Management
NASA Astrophysics Data System (ADS)
Zhang, J.; Bao, Q.; Lee, T. J.
2016-12-01
Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowen, Benjamin; Ruebel, Oliver; Fischer, Curt Fischer R.
BASTet is an advanced software library written in Python. BASTet serves as the analysis and storage library for the OpenMSI project. BASTet is an integrate framework for: i) storage of spectral imaging data, ii) storage of derived analysis data, iii) provenance of analyses, iv) integration and execution of analyses via complex workflows. BASTet implements the API for the HDF5 storage format used by OpenMSI. Analyses that are developed using BASTet benefit from direct integration with storage format, automatic tracking of provenance, and direct integration with command-line and workflow execution tools. BASTet also defines interfaces to enable developers to directly integratemore » their analysis with OpenMSI's web-based viewing infrastruture without having to know OpenMSI. BASTet also provides numerous helper classes and tools to assist with the conversion of data files, ease parallel implementation of analysis algorithms, ease interaction with web-based functions, description methods for data reduction. BASTet also includes detailed developer documentation, user tutorials, iPython notebooks, and other supporting documents.« less
Matches, Mismatches, and Methods: Multiple-View Workflows for Energy Portfolio Analysis.
Brehmer, Matthew; Ng, Jocelyn; Tate, Kevin; Munzner, Tamara
2016-01-01
The energy performance of large building portfolios is challenging to analyze and monitor, as current analysis tools are not scalable or they present derived and aggregated data at too coarse of a level. We conducted a visualization design study, beginning with a thorough work domain analysis and a characterization of data and task abstractions. We describe generalizable visual encoding design choices for time-oriented data framed in terms of matches and mismatches, as well as considerations for workflow design. Our designs address several research questions pertaining to scalability, view coordination, and the inappropriateness of line charts for derived and aggregated data due to a combination of data semantics and domain convention. We also present guidelines relating to familiarity and trust, as well as methodological considerations for visualization design studies. Our designs were adopted by our collaborators and incorporated into the design of an energy analysis software application that will be deployed to tens of thousands of energy workers in their client base.
Välikangas, Tommi; Suomi, Tomi; Elo, Laura L
2017-05-31
Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.
A practical data processing workflow for multi-OMICS projects.
Kohl, Michael; Megger, Dominik A; Trippler, Martin; Meckel, Hagen; Ahrens, Maike; Bracht, Thilo; Weber, Frank; Hoffmann, Andreas-Claudius; Baba, Hideo A; Sitek, Barbara; Schlaak, Jörg F; Meyer, Helmut E; Stephan, Christian; Eisenacher, Martin
2014-01-01
Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
Meyer, Folker; Bagchi, Saurabh; Chaterji, Somali; Gerlach, Wolfgang; Grama, Ananth; Harrison, Travis; Paczian, Tobias; Trimble, William L; Wilke, Andreas
2017-09-26
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks. Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.
Faure, Emmanuel; Savy, Thierry; Rizzi, Barbara; Melani, Camilo; Stašová, Olga; Fabrèges, Dimitri; Špir, Róbert; Hammons, Mark; Čúnderlík, Róbert; Recher, Gaëlle; Lombardot, Benoît; Duloquin, Louise; Colin, Ingrid; Kollár, Jozef; Desnoulez, Sophie; Affaticati, Pierre; Maury, Benoît; Boyreau, Adeline; Nief, Jean-Yves; Calvat, Pascal; Vernier, Philippe; Frain, Monique; Lutfalla, Georges; Kergosien, Yannick; Suret, Pierre; Remešíková, Mariana; Doursat, René; Sarti, Alessandro; Mikula, Karol; Peyriéras, Nadine; Bourgine, Paul
2016-01-01
The quantitative and systematic analysis of embryonic cell dynamics from in vivo 3D+time image data sets is a major challenge at the forefront of developmental biology. Despite recent breakthroughs in the microscopy imaging of living systems, producing an accurate cell lineage tree for any developing organism remains a difficult task. We present here the BioEmergences workflow integrating all reconstruction steps from image acquisition and processing to the interactive visualization of reconstructed data. Original mathematical methods and algorithms underlie image filtering, nucleus centre detection, nucleus and membrane segmentation, and cell tracking. They are demonstrated on zebrafish, ascidian and sea urchin embryos with stained nuclei and membranes. Subsequent validation and annotations are carried out using Mov-IT, a custom-made graphical interface. Compared with eight other software tools, our workflow achieved the best lineage score. Delivered in standalone or web service mode, BioEmergences and Mov-IT offer a unique set of tools for in silico experimental embryology. PMID:26912388
2011-01-01
Background Based on barriers to the use of computerized clinical decision support (CDS) learned in an earlier field study, we prototyped design enhancements to the Veterans Health Administration's (VHA's) colorectal cancer (CRC) screening clinical reminder to compare against the VHA's current CRC reminder. Methods In a controlled simulation experiment, 12 primary care providers (PCPs) used prototypes of the current and redesigned CRC screening reminder in a within-subject comparison. Quantitative measurements were based on a usability survey, workload assessment instrument, and workflow integration survey. We also collected qualitative data on both designs. Results Design enhancements to the VHA's existing CRC screening clinical reminder positively impacted aspects of usability and workflow integration but not workload. The qualitative analysis revealed broad support across participants for the design enhancements with specific suggestions for improving the reminder further. Conclusions This study demonstrates the value of a human-computer interaction evaluation in informing the redesign of information tools to foster uptake, integration into workflow, and use in clinical practice. PMID:22126324
Liaw, Siaw-Teng; Deveny, Elizabeth; Morrison, Iain; Lewis, Bryn
2006-09-01
Using a factorial vignette survey and modeling methodology, we developed clinical and information models - incorporating evidence base, key concepts, relevant terms, decision-making and workflow needed to practice safely and effectively - to guide the development of an integrated rule-based knowledge module to support prescribing decisions in asthma. We identified workflows, decision-making factors, factor use, and clinician information requirements. The Unified Modeling Language (UML) and public domain software and knowledge engineering tools (e.g. Protégé) were used, with the Australian GP Data Model as the starting point for expressing information needs. A Web Services service-oriented architecture approach was adopted within which to express functional needs, and clinical processes and workflows were expressed in the Business Process Execution Language (BPEL). This formal analysis and modeling methodology to define and capture the process and logic of prescribing best practice in a reference implementation is fundamental to tackling deficiencies in prescribing decision support software.
Vernick, Kenneth D.
2017-01-01
Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
A computational workflow for designing silicon donor qubits
Humble, Travis S.; Ericson, M. Nance; Jakowski, Jacek; ...
2016-09-19
Developing devices that can reliably and accurately demonstrate the principles of superposition and entanglement is an on-going challenge for the quantum computing community. Modeling and simulation offer attractive means of testing early device designs and establishing expectations for operational performance. However, the complex integrated material systems required by quantum device designs are not captured by any single existing computational modeling method. We examine the development and analysis of a multi-staged computational workflow that can be used to design and characterize silicon donor qubit systems with modeling and simulation. Our approach integrates quantum chemistry calculations with electrostatic field solvers to performmore » detailed simulations of a phosphorus dopant in silicon. We show how atomistic details can be synthesized into an operational model for the logical gates that define quantum computation in this particular technology. In conclusion, the resulting computational workflow realizes a design tool for silicon donor qubits that can help verify and validate current and near-term experimental devices.« less
A python framework for environmental model uncertainty analysis
White, Jeremy; Fienen, Michael N.; Doherty, John E.
2016-01-01
We have developed pyEMU, a python framework for Environmental Modeling Uncertainty analyses, open-source tool that is non-intrusive, easy-to-use, computationally efficient, and scalable to highly-parameterized inverse problems. The framework implements several types of linear (first-order, second-moment (FOSM)) and non-linear uncertainty analyses. The FOSM-based analyses can also be completed prior to parameter estimation to help inform important modeling decisions, such as parameterization and objective function formulation. Complete workflows for several types of FOSM-based and non-linear analyses are documented in example notebooks implemented using Jupyter that are available in the online pyEMU repository. Example workflows include basic parameter and forecast analyses, data worth analyses, and error-variance analyses, as well as usage of parameter ensemble generation and management capabilities. These workflows document the necessary steps and provides insights into the results, with the goal of educating users not only in how to apply pyEMU, but also in the underlying theory of applied uncertainty quantification.
Cyberinfrastructure for End-to-End Environmental Explorations
NASA Astrophysics Data System (ADS)
Merwade, V.; Kumar, S.; Song, C.; Zhao, L.; Govindaraju, R.; Niyogi, D.
2007-12-01
The design and implementation of a cyberinfrastructure for End-to-End Environmental Exploration (C4E4) is presented. The C4E4 framework addresses the need for an integrated data/computation platform for studying broad environmental impacts by combining heterogeneous data resources with state-of-the-art modeling and visualization tools. With Purdue being a TeraGrid Resource Provider, C4E4 builds on top of the Purdue TeraGrid data management system and Grid resources, and integrates them through a service-oriented workflow system. It allows researchers to construct environmental workflows for data discovery, access, transformation, modeling, and visualization. Using the C4E4 framework, we have implemented an end-to-end SWAT simulation and analysis workflow that connects our TeraGrid data and computation resources. It enables researchers to conduct comprehensive studies on the impact of land management practices in the St. Joseph watershed using data from various sources in hydrologic, atmospheric, agricultural, and other related disciplines.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-21
... Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011-D027... Wide Area WorkFlow (WAWF) and TRICARE Encounter Data System (TEDS). WAWF, which electronically... civil emergencies, when access to Wide Area WorkFlow by those contractors is not feasible; (4) Purchases...
An Auto-management Thesis Program WebMIS Based on Workflow
NASA Astrophysics Data System (ADS)
Chang, Li; Jie, Shi; Weibo, Zhong
An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.
NASA Astrophysics Data System (ADS)
Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.
Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Integrated Sensitivity Analysis Workflow
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedman-Hill, Ernest J.; Hoffman, Edward L.; Gibson, Marcus J.
2014-08-01
Sensitivity analysis is a crucial element of rigorous engineering analysis, but performing such an analysis on a complex model is difficult and time consuming. The mission of the DART Workbench team at Sandia National Laboratories is to lower the barriers to adoption of advanced analysis tools through software integration. The integrated environment guides the engineer in the use of these integrated tools and greatly reduces the cycle time for engineering analysis.
Integrating Behavioral Health in Primary Care Using Lean Workflow Analysis: A Case Study
van Eeghen, Constance; Littenberg, Benjamin; Holman, Melissa D.; Kessler, Rodger
2016-01-01
Background Primary care offices are integrating behavioral health (BH) clinicians into their practices. Implementing such a change is complex, difficult, and time consuming. Lean workflow analysis may be an efficient, effective, and acceptable method for integration. Objective Observe BH integration into primary care and measure its impact. Design Prospective, mixed methods case study in a primary care practice. Measurements Change in treatment initiation (referrals generating BH visits within the system). Secondary measures: primary care visits resulting in BH referrals, referrals resulting in scheduled appointments, time from referral to scheduled appointment, and time from referral to first visit. Providers and staff were surveyed on the Lean method. Results Referrals increased from 23 to 37/1000 visits (P<.001). Referrals resulted in more scheduled (60% to 74%, P<.001) and arrived visits (44% to 53%, P=.025). Time from referral to first scheduled visit decreased (Hazard Ratio (HR) 1.60; 95% Confidence Interval (CI) 1.37, 1.88; P<0.001) as did time to first arrived visit (HR 1.36; 95% CI 1.14, 1.62; P=0.001). Surveys and comments were positive. Conclusions This pilot integration of BH showed significant improvements in treatment initiation and other measures. Strengths of Lean included workflow improvement, system perspective, and project success. Further evaluation is indicated. PMID:27170796
The Complete Exosome Workflow Solution: From Isolation to Characterization of RNA Cargo
Schageman, Jeoffrey; Li, Mu; Barta, Tim; Lea, Kristi; Gu, Jian; Magdaleno, Susan; Setterquist, Robert; Vlassov, Alexander V.
2013-01-01
Exosomes are small (30–150 nm) vesicles containing unique RNA and protein cargo, secreted by all cell types in culture. They are also found in abundance in body fluids including blood, saliva, and urine. At the moment, the mechanism of exosome formation, the makeup of the cargo, biological pathways, and resulting functions are incompletely understood. One of their most intriguing roles is intercellular communication—exosomes function as the messengers, delivering various effector or signaling macromolecules between specific cells. There is an exponentially growing need to dissect structure and the function of exosomes and utilize them for development of minimally invasive diagnostics and therapeutics. Critical to further our understanding of exosomes is the development of reagents, tools, and protocols for their isolation, characterization, and analysis of their RNA and protein contents. Here we describe a complete exosome workflow solution, starting from fast and efficient extraction of exosomes from cell culture media and serum to isolation of RNA followed by characterization of exosomal RNA content using qRT-PCR and next-generation sequencing techniques. Effectiveness of this workflow is exemplified by analysis of the RNA content of exosomes derived from HeLa cell culture media and human serum, using Ion Torrent PGM as a sequencing platform. PMID:24205503
A comprehensive quality control workflow for paired tumor-normal NGS experiments.
Schroeder, Christopher M; Hilke, Franz J; Löffler, Markus W; Bitzer, Michael; Lenz, Florian; Sturm, Marc
2017-06-01
Quality control (QC) is an important part of all NGS data analysis stages. Many available tools calculate QC metrics from different analysis steps of single sample experiments (raw reads, mapped reads and variant lists). Multi-sample experiments, as sequencing of tumor-normal pairs, require additional QC metrics to ensure validity of results. These multi-sample QC metrics still lack standardization. We therefore suggest a new workflow for QC of DNA sequencing of tumor-normal pairs. With this workflow well-known single-sample QC metrics and additional metrics specific for tumor-normal pairs can be calculated. The segmentation into different tools offers a high flexibility and allows reuse for other purposes. All tools produce qcML, a generic XML format for QC of -omics experiments. qcML uses quality metrics defined in an ontology, which was adapted for NGS. All QC tools are implemented in C ++ and run both under Linux and Windows. Plotting requires python 2.7 and matplotlib. The software is available under the 'GNU General Public License version 2' as part of the ngs-bits project: https://github.com/imgag/ngs-bits. christopher.schroeder@med.uni-tuebingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Xiao, Hongxia; Brinkmann, Markus; Thalmann, Beat; Schiwy, Andreas; Große Brinkhaus, Sigrid; Achten, Christine; Eichbaum, Kathrin; Gembé, Carolin; Seiler, Thomas-Benjamin; Hollert, Henner
2017-03-21
Effect-directed analysis (EDA) is a powerful strategy to identify biologically active compounds in environmental samples. However, in current EDA studies, fractionation and handling procedures are laborious, consist of multiple evaporation steps, and thus bear the risk of contamination and decreased recoveries of the target compounds. The low resulting throughput has been one of the major bottlenecks of EDA. Here, we propose a high-throughput EDA (HT-EDA) work-flow combining reversed phase high-performance liquid chromatography fractionation of samples into 96-well microplates, followed by toxicity assessment in the micro-EROD bioassay with the wild-type rat hepatoma H4IIE cells, and chemical analysis of bioactive fractions. The approach was evaluated using single substances, binary mixtures, and extracts of sediment samples collected at the Three Gorges Reservoir, Yangtze River, China, as well as the rivers Rhine and Elbe, Germany. Selected bioactive fractions were analyzed by highly sensitive gas chromatography-atmospheric pressure laser ionization-time-of-flight-mass spectrometry. In addition, we optimized the work-flow by seeding previously adapted suspension-cultured H4IIE cells directly into the microplate used for fractionation, which makes any transfers of fractionated samples unnecessary. The proposed HT-EDA work-flow simplifies the procedure for wider application in ecotoxicology and environmental routine programs.
DNAseq Workflow in a Diagnostic Context and an Example of a User Friendly Implementation.
Wolf, Beat; Kuonen, Pierre; Dandekar, Thomas; Atlan, David
2015-01-01
Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.
The VERCE platform: Enabling Computational Seismology via Streaming Workflows and Science Gateways
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Filgueira, Rosa; Krause, Amrey; Matser, Jonas; Casarotti, Emanuele; Magnoni, Federica; Gemund, Andre; Frobert, Laurent; Krischer, Lion; Atkinson, Malcolm
2015-04-01
The VERCE project is creating an e-Science platform to facilitate innovative data analysis and coding methods that fully exploit the wealth of data in global seismology. One of the technologies developed within the project is the Dispel4Py python library, which allows to describe abstract stream-based workflows for data-intensive applications and to execute them in a distributed environment. At runtime Dispel4Py is able to map workflow descriptions dynamically onto a number of computational resources (Apache Storm clusters, MPI powered clusters, and shared-memory multi-core machines, single-core machines), setting it apart from other workflow frameworks. Therefore, Dispel4Py enables scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Among the workflows developed with Dispel4Py in VERCE, we mention here those for Seismic Ambient Noise Cross-Correlation and MISFIT calculation, which address two data-intensive problems that are common in computational seismology. The former, also called Passive Imaging, allows the detection of relative seismic-wave velocity variations during the time of recording, to be associated with the stress-field changes that occurred in the test area. The MISFIT instead, takes as input the synthetic seismograms generated from HPC simulations for a certain Earth model and earthquake and, after a preprocessing stage, compares them with real observations in order to foster subsequent model updates and improvement (Inversion). The VERCE Science Gateway exposes the MISFIT calculation workflow as a service, in combination with the simulation phase. Both phases can be configured, controlled and monitored by the user via a rich user interface which is integrated within the gUSE Science Gateway framework, hiding the complexity of accessing third parties data services, security mechanisms and enactment on the target resources. Thanks to a modular extension to the Dispel4Py framework, the system collects provenance data adopting the W3C-PROV data model. Provenance recordings can be explored and analysed at run time for rapid diagnostic and workflow steering, or later for further validation and comparisons across runs. We will illustrate the interactive services of the gateway and the capabilities of the produced metadata, coupled with the VERCE data management layer based on iRODS. The Cross-Correlation workflow was evaluated on SuperMUC, a supercomputing cluster at the Leibniz Supercomputing Centre in Munich, with 155,656 processor cores in 9400 compute nodes. SuperMUC is based on the Intel Xeon architecture consisting of 18 Thin Node Islands and one Fat Node Island. This work has only had access to the Thin Node Islands, which contain Sandy Bridge nodes, each having 16 cores and 32 GB of memory. In the evaluations we used 1000 stations, and we applied two types of methods (whiten and non-whiten) for pre-processing the data. The workflow was tested on a varying number of cores (16, 32, 64, 128, and 256 cores) using the MPI mapping of Dispel4Py. The results show that Dispel4Py is able to improve the performance by increasing the number of cores without changing the description of the workflow.
A workflow learning model to improve geovisual analytics utility
Roth, Robert E; MacEachren, Alan M; McCabe, Craig A
2011-01-01
Introduction This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. Objectives The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. Methodology The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545
A workflow learning model to improve geovisual analytics utility.
Roth, Robert E; Maceachren, Alan M; McCabe, Craig A
2009-01-01
INTRODUCTION: This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. OBJECTIVES: The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. METHODOLOGY: The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
Lo, Sheng-Ying; Baird, Geoffrey S; Greene, Dina N
2015-12-07
Proper utilization of resources is an important operational objective for clinical laboratories. To reduce unnecessary manual interventions on automated instruments, we conducted a workflow analysis that optimized dilution parameters and reporting of abnormally high chemistry results for the Beckman AU series of chemistry analyzers while maintaining clinically acceptable reportable ranges. Workflow analysis for the Beckman AU680/5812 and DxC800 chemistry analyzers was performed using historical data. Clinical reportable ranges for 53 chemistry analytes were evaluated. Optimized dilution parameters and upper limit of reportable ranges for the AU680/5812 instruments were derived and validated to meet these reportable ranges. The number of specimens that required manual dilutions before and after optimization was determined for both the AU680/5812 and DxC800, with the DxC800 serving as the reference instrument. Retrospective data analysis revealed that 7700 specimens required manual dilutions on the DxC over a 2-y period. Using our optimized AU-specific dilution and reporting parameters, the data-driven simulation analysis showed a 61% reduction in manual dilutions. For the specimens that required manual dilutions on the AU680/5812, we developed standardized dilution procedures to further streamline workflow. We provide a data-driven, practical outline for clinical laboratories to efficiently optimize their use of automated chemistry analyzers. The outcomes can be used to assist laboratories wishing to improve their existing procedures or to facilitate transitioning into a new line of instrumentation, regardless of the instrument model or manufacturer. Copyright © 2015 Elsevier B.V. All rights reserved.
Surinova, Silvia; Hüttenhain, Ruth; Chang, Ching-Yun; Espona, Lucia; Vitek, Olga; Aebersold, Ruedi
2013-08-01
Targeted proteomics based on selected reaction monitoring (SRM) mass spectrometry is commonly used for accurate and reproducible quantification of protein analytes in complex biological mixtures. Strictly hypothesis-driven, SRM assays quantify each targeted protein by collecting measurements on its peptide fragment ions, called transitions. To achieve sensitive and accurate quantitative results, experimental design and data analysis must consistently account for the variability of the quantified transitions. This consistency is especially important in large experiments, which increasingly require profiling up to hundreds of proteins over hundreds of samples. Here we describe a robust and automated workflow for the analysis of large quantitative SRM data sets that integrates data processing, statistical protein identification and quantification, and dissemination of the results. The integrated workflow combines three software tools: mProphet for peptide identification via probabilistic scoring; SRMstats for protein significance analysis with linear mixed-effect models; and PASSEL, a public repository for storage, retrieval and query of SRM data. The input requirements for the protocol are files with SRM traces in mzXML format, and a file with a list of transitions in a text tab-separated format. The protocol is especially suited for data with heavy isotope-labeled peptide internal standards. We demonstrate the protocol on a clinical data set in which the abundances of 35 biomarker candidates were profiled in 83 blood plasma samples of subjects with ovarian cancer or benign ovarian tumors. The time frame to realize the protocol is 1-2 weeks, depending on the number of replicates used in the experiment.
SWS: accessing SRS sites contents through Web Services.
Romano, Paolo; Marra, Domenico
2008-03-26
Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
Kullolli, Majlinda; Rock, Dan A; Ma, Ji
2017-02-03
Characterization of in vitro and in vivo catabolism of therapeutic proteins has increasingly become an integral part of discovery and development process for novel proteins. Unambiguous and efficient identification of catabolites can not only facilitate accurate understanding of pharmacokinetic profiles of drug candidates, but also enables follow up protein engineering to generate more catabolically stable molecules with improved properties (pharmacokinetics and pharmacodynamics). Immunoaffinity capture (IC) followed by top-down intact protein analysis using either matrix-assisted laser desorption/ionization or electrospray ionization mass spectrometry analysis have been the primary methods of choice for catabolite identification. However, the sensitivity and efficiency of these methods is not always sufficient for characterization of novel proteins from complex biomatrices such as plasma or serum. In this study a novel bottom-up targeted protein workflow was optimized for analysis of proteolytic degradation of therapeutic proteins. Selective and sensitive tagging of the alpha-amine at the N-terminus of proteins of interest was performed by immunoaffinity capture of therapeutic protein and its catabolites followed by on-bead succinimidyloxycarbonylmethyl tri-(2,4,6-trimethoxyphenyl N-terminus (TMPP-NTT) tagging. The positively charged hydrophobic TMPP tag facilitates unambiguous sequence identification of all N-terminus peptides from complex tryptic digestion samples via data dependent liquid chromatgraphy-tandem mass spectroscopy. Utility of the workflow was illustrated by definitive analysis of in vitro catabolic profile of neurotensin human Fc (NTs-huFc) protein in mouse serum. The results from this study demonstrated that the IC-TMPP-NTT workflow is a simple and efficient method for catabolite formation in therapeutic proteins.
Hessling, Bernd; Büttner, Knut; Hecker, Michael; Becher, Dörte
2013-01-01
Quantitative LC-MALDI is an underrepresented method, especially in large-scale experiments. The additional fractionation step that is needed for most MALDI-TOF-TOF instruments, the comparatively long analysis time, and the very limited number of established software tools for the data analysis render LC-MALDI a niche application for large quantitative analyses beside the widespread LC–electrospray ionization workflows. Here, we used LC-MALDI in a relative quantification analysis of Staphylococcus aureus for the first time on a proteome-wide scale. Samples were analyzed in parallel with an LTQ-Orbitrap, which allowed cross-validation with a well-established workflow. With nearly 850 proteins identified in the cytosolic fraction and quantitative data for more than 550 proteins obtained with the MASCOT Distiller software, we were able to prove that LC-MALDI is able to process highly complex samples. The good correlation of quantities determined via this method and the LTQ-Orbitrap workflow confirmed the high reliability of our LC-MALDI approach for global quantification analysis. Because the existing literature reports differences for MALDI and electrospray ionization preferences and the respective experimental work was limited by technical or methodological constraints, we systematically compared biochemical attributes of peptides identified with either instrument. This genome-wide, comprehensive study revealed biases toward certain peptide properties for both MALDI-TOF-TOF- and LTQ-Orbitrap-based approaches. These biases are based on almost 13,000 peptides and result in a general complementarity of the two approaches that should be exploited in future experiments. PMID:23788530
Hessling, Bernd; Büttner, Knut; Hecker, Michael; Becher, Dörte
2013-10-01
Quantitative LC-MALDI is an underrepresented method, especially in large-scale experiments. The additional fractionation step that is needed for most MALDI-TOF-TOF instruments, the comparatively long analysis time, and the very limited number of established software tools for the data analysis render LC-MALDI a niche application for large quantitative analyses beside the widespread LC-electrospray ionization workflows. Here, we used LC-MALDI in a relative quantification analysis of Staphylococcus aureus for the first time on a proteome-wide scale. Samples were analyzed in parallel with an LTQ-Orbitrap, which allowed cross-validation with a well-established workflow. With nearly 850 proteins identified in the cytosolic fraction and quantitative data for more than 550 proteins obtained with the MASCOT Distiller software, we were able to prove that LC-MALDI is able to process highly complex samples. The good correlation of quantities determined via this method and the LTQ-Orbitrap workflow confirmed the high reliability of our LC-MALDI approach for global quantification analysis. Because the existing literature reports differences for MALDI and electrospray ionization preferences and the respective experimental work was limited by technical or methodological constraints, we systematically compared biochemical attributes of peptides identified with either instrument. This genome-wide, comprehensive study revealed biases toward certain peptide properties for both MALDI-TOF-TOF- and LTQ-Orbitrap-based approaches. These biases are based on almost 13,000 peptides and result in a general complementarity of the two approaches that should be exploited in future experiments.
BioVLAB-MMIA: a cloud environment for microRNA and mRNA integrated analysis (MMIA) on Amazon EC2.
Lee, Hyungro; Yang, Youngik; Chae, Heejoon; Nam, Seungyoon; Choi, Donghoon; Tangchaisin, Patanachai; Herath, Chathura; Marru, Suresh; Nephew, Kenneth P; Kim, Sun
2012-09-01
MicroRNAs, by regulating the expression of hundreds of target genes, play critical roles in developmental biology and the etiology of numerous diseases, including cancer. As a vast amount of microRNA expression profile data are now publicly available, the integration of microRNA expression data sets with gene expression profiles is a key research problem in life science research. However, the ability to conduct genome-wide microRNA-mRNA (gene) integration currently requires sophisticated, high-end informatics tools, significant expertise in bioinformatics and computer science to carry out the complex integration analysis. In addition, increased computing infrastructure capabilities are essential in order to accommodate large data sets. In this study, we have extended the BioVLAB cloud workbench to develop an environment for the integrated analysis of microRNA and mRNA expression data, named BioVLAB-MMIA. The workbench facilitates computations on the Amazon EC2 and S3 resources orchestrated by the XBaya Workflow Suite. The advantages of BioVLAB-MMIA over the web-based MMIA system include: 1) readily expanded as new computational tools become available; 2) easily modifiable by re-configuring graphic icons in the workflow; 3) on-demand cloud computing resources can be used on an "as needed" basis; 4) distributed orchestration supports complex and long running workflows asynchronously. We believe that BioVLAB-MMIA will be an easy-to-use computing environment for researchers who plan to perform genome-wide microRNA-mRNA (gene) integrated analysis tasks.
Data Management and Archiving - a Long Process
NASA Astrophysics Data System (ADS)
Gebauer, Petra; Bertelmann, Roland; Hasler, Tim; Kirchner, Ingo; Klump, Jens; Mettig, Nora; Peters-Kottig, Wolfgang; Rusch, Beate; Ulbricht, Damian
2014-05-01
Implementing policies for research data management to the end of data archiving at university institutions takes a long time. Even though, especially in geosciences, most of the scientists are familiar to analyze different sorts of data, to present statistical results and to write publications sometimes based on big data records, only some of them manage their data in a standardized manner. Much more often they have learned how to measure and to generate large volumes of data than to document these measurements and to preserve them for the future. Changing staff and limited funding make this work more difficult, but it is essential in a progressively developing digital and networked world. Results from the project EWIG (Translates to: Developing workflow components for long-term archiving of research data in geosciences), funded by Deutsche Forschungsgemeinschaft, will help on these theme. Together with the project partners Deutsches GeoForschungsZentrum Potsdam and Konrad-Zuse-Zentrum für Informationstechnik Berlin a workflow to transfer continuously recorded data from a meteorological city monitoring network into a long-term archive was developed. This workflow includes quality assurance of the data as well as description of metadata and using tools to prepare data packages for long term archiving. It will be an exemplary model for other institutions working with similar data. The development of this workflow is closely intertwined with the educational curriculum at the Institut für Meteorologie. Designing modules to run quality checks for meteorological time series of data measured every minute and preparing metadata are tasks in actual bachelor theses. Students will also test the usability of the generated working environment. Based on these experiences a practical guideline for integrating research data management in curricula will be one of the results of this project, for postgraduates as well as for younger students. Especially at the beginning of the scientific career it is necessary to become familiar with all issues concerning data management. The outcomes of EWIG are intended to be generic enough to be easily adopted by other institutions. University lectures in meteorology were started to teach future scientific generations right from the start how to deal with all sorts of different data in a transparent way. The progress of the project EWIG can be followed on the web via ewig.gfz-potsdam.de
Embi, Peter J.; Yackel, Thomas R.; Logan, Judith R.; Bowen, Judith L.; Cooney, Thomas G.; Gorman, Paul N.
2004-01-01
Objective: Computerized physician documentation (CPD) has been implemented throughout the nation's Veterans Affairs Medical Centers (VAMCs) and is likely to increasingly replace handwritten documentation in other institutions. The use of this technology may affect educational and clinical activities, yet little has been reported in this regard. The authors conducted a qualitative study to determine the perceived impacts of CPD among faculty and housestaff in a VAMC. Design: A cross-sectional study was conducted using semistructured interviews with faculty (n = 10) and a group interview with residents (n = 10) at a VAMC teaching hospital. Measurements: Content analysis of field notes and taped transcripts were done by two independent reviewers using a grounded theory approach. Findings were validated using member checking and peer debriefing. Results: Four major themes were identified: (1) improved availability of documentation; (2) changes in work processes and communication; (3) alterations in document structure and content; and (4) mistakes, concerns, and decreased confidence in the data. With a few exceptions, subjects felt documentation was more available, with benefits for education and patient care. Other impacts of CPD were largely seen as detrimental to aspects of clinical practice and education, including documentation quality, workflow, professional communication, and patient care. Conclusion: CPD is perceived to have substantial positive and negative impacts on clinical and educational activities and environments. Care should be taken when designing, implementing, and using such systems to avoid or minimize any harmful impacts. More research is needed to assess the extent of the impacts identified and to determine the best strategies to effectively deal with them. PMID:15064287
HEPDOOP: High-Energy Physics Analysis using Hadoop
NASA Astrophysics Data System (ADS)
Bhimji, W.; Bristow, T.; Washbrook, A.
2014-06-01
We perform a LHC data analysis workflow using tools and data formats that are commonly used in the "Big Data" community outside High Energy Physics (HEP). These include Apache Avro for serialisation to binary files, Pig and Hadoop for mass data processing and Python Scikit-Learn for multi-variate analysis. Comparison is made with the same analysis performed with current HEP tools in ROOT.
Novak, Laurie L; Johnson, Kevin B; Lorenzi, Nancy M
2010-01-01
The objective of this review was to describe methods used to study and model workflow. The authors included studies set in a variety of industries using qualitative, quantitative and mixed methods. Of the 6221 matching abstracts, 127 articles were included in the final corpus. The authors collected data from each article on researcher perspective, study type, methods type, specific methods, approaches to evaluating quality of results, definition of workflow and dependent variables. Ethnographic observation and interviews were the most frequently used methods. Long study durations revealed the large time commitment required for descriptive workflow research. The most frequently discussed technique for evaluating quality of study results was triangulation. The definition of the term “workflow” and choice of methods for studying workflow varied widely across research areas and researcher perspectives. The authors developed a conceptual framework of workflow-related terminology for use in future research and present this model for use by other researchers. PMID:20442143
Digitization workflows for flat sheets and packets of plants, algae, and fungi1
Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.
2015-01-01
Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256
Zugaj, D; Chenet, A; Petit, L; Vaglio, J; Pascual, T; Piketty, C; Bourdes, V
2018-02-04
Currently, imaging technologies that can accurately assess or provide surrogate markers of the human cutaneous microvessel network are limited. Dynamic optical coherence tomography (D-OCT) allows the detection of blood flow in vivo and visualization of the skin microvasculature. However, image processing is necessary to correct images, filter artifacts, and exclude irrelevant signals. The objective of this study was to develop a novel image processing workflow to enhance the technical capabilities of D-OCT. Single-center, vehicle-controlled study including healthy volunteers aged 18-50 years. A capsaicin solution was applied topically on the subject's forearm to induce local inflammation. Measurements of capsaicin-induced increase in dermal blood flow, within the region of interest, were performed by laser Doppler imaging (LDI) (reference method) and D-OCT. Sixteen subjects were enrolled. A good correlation was shown between D-OCT and LDI, using the image processing workflow. Therefore, D-OCT offers an easy-to-use alternative to LDI, with good repeatability, new robust morphological features (dermal-epidermal junction localization), and quantification of the distribution of vessel size and changes in this distribution induced by capsaicin. The visualization of the vessel network was improved through bloc filtering and artifact removal. Moreover, the assessment of vessel size distribution allows a fine analysis of the vascular patterns. The newly developed image processing workflow enhances the technical capabilities of D-OCT for the accurate detection and characterization of microcirculation in the skin. A direct clinical application of this image processing workflow is the quantification of the effect of topical treatment on skin vascularization. © 2018 The Authors. Skin Research and Technology Published by John Wiley & Sons Ltd.
FluxCTTX: A LIMS-based tool for management and analysis of cytotoxicity assays data
2015-01-01
Background Cytotoxicity assays have been used by researchers to screen for cytotoxicity in compound libraries. Researchers can either look for cytotoxic compounds or screen "hits" from initial high-throughput drug screens for unwanted cytotoxic effects before investing in their development as a pharmaceutical. These assays may be used as an alternative to animal experimentation and are becoming increasingly important in modern laboratories. However, the execution of these assays in large scale and different laboratories requires, among other things, the management of protocols, reagents, cell lines used as well as the data produced, which can be a challenge. The management of all this information is greatly improved by the utilization of computational tools to save time and guarantee quality. However, a tool that performs this task designed specifically for cytotoxicity assays is not yet available. Results In this work, we have used a workflow based LIMS -- the Flux system -- and the Together Workflow Editor as a framework to develop FluxCTTX, a tool for management of data from cytotoxicity assays performed at different laboratories. The main work is the development of a workflow, which represents all stages of the assay and has been developed and uploaded in Flux. This workflow models the activities of cytotoxicity assays performed as described in the OECD 129 Guidance Document. Conclusions FluxCTTX presents a solution for the management of the data produced by cytotoxicity assays performed at Interlaboratory comparisons. Its adoption will contribute to guarantee the quality of activities in the process of cytotoxicity tests and enforce the use of Good Laboratory Practices (GLP). Furthermore, the workflow developed is complete and can be adapted to other contexts and different tests for management of other types of data. PMID:26696462
Integrating Visualizations into Modeling NEST Simulations
Nowke, Christian; Zielasko, Daniel; Weyers, Benjamin; Peyser, Alexander; Hentschel, Bernd; Kuhlen, Torsten W.
2015-01-01
Modeling large-scale spiking neural networks showing realistic biological behavior in their dynamics is a complex and tedious task. Since these networks consist of millions of interconnected neurons, their simulation produces an immense amount of data. In recent years it has become possible to simulate even larger networks. However, solutions to assist researchers in understanding the simulation's complex emergent behavior by means of visualization are still lacking. While developing tools to partially fill this gap, we encountered the challenge to integrate these tools easily into the neuroscientists' daily workflow. To understand what makes this so challenging, we looked into the workflows of our collaborators and analyzed how they use the visualizations to solve their daily problems. We identified two major issues: first, the analysis process can rapidly change focus which requires to switch the visualization tool that assists in the current problem domain. Second, because of the heterogeneous data that results from simulations, researchers want to relate data to investigate these effectively. Since a monolithic application model, processing and visualizing all data modalities and reflecting all combinations of possible workflows in a holistic way, is most likely impossible to develop and to maintain, a software architecture that offers specialized visualization tools that run simultaneously and can be linked together to reflect the current workflow, is a more feasible approach. To this end, we have developed a software architecture that allows neuroscientists to integrate visualization tools more closely into the modeling tasks. In addition, it forms the basis for semantic linking of different visualizations to reflect the current workflow. In this paper, we present this architecture and substantiate the usefulness of our approach by common use cases we encountered in our collaborative work. PMID:26733860
ObsPy: A Python Toolbox for Seismology - Recent Developments and Applications
NASA Astrophysics Data System (ADS)
Megies, T.; Krischer, L.; Barsch, R.; Sales de Andrade, E.; Beyreuther, M.
2014-12-01
ObsPy (http://www.obspy.org) is a community-driven, open-source project dedicated to building a bridge for seismology into the scientific Python ecosystem. It offersa) read and write support for essentially all commonly used waveform, station, and event metadata file formats with a unified interface,b) a comprehensive signal processing toolbox tuned to the needs of seismologists,c) integrated access to all large data centers, web services and databases, andd) convenient wrappers to legacy codes like libtau and evalresp.Python, currently the most popular language for teaching introductory computer science courses at top-ranked U.S. departments, is a full-blown programming language with the flexibility of an interactive scripting language. Its extensive standard library and large variety of freely available high quality scientific modules cover most needs in developing scientific processing workflows. Together with packages like NumPy, SciPy, Matplotlib, IPython, Pandas, lxml, and PyQt, ObsPy enables the construction of complete workflows in Python. These vary from reading locally stored data or requesting data from one or more different data centers through to signal analysis and data processing and on to visualizations in GUI and web applications, output of modified/derived data and the creation of publication-quality figures.ObsPy enjoys a large world-wide rate of adoption in the community. Applications successfully using it include time-dependent and rotational seismology, big data processing, event relocations, and synthetic studies about attenuation kernels and full-waveform inversions to name a few examples. All functionality is extensively documented and the ObsPy tutorial and gallery give a good impression of the wide range of possible use cases.We will present the basic features of ObsPy, new developments and applications, and a roadmap for the near future and discuss the sustainability of our open-source development model.
Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows
NASA Astrophysics Data System (ADS)
Ivanov, Mark V.; Levitsky, Lev I.; Gorshkov, Mikhail V.
2016-09-01
A number of proteomic database search engines implement multi-stage strategies aiming at increasing the sensitivity of proteome analysis. These approaches often employ a subset of the original database for the secondary stage of analysis. However, if target-decoy approach (TDA) is used for false discovery rate (FDR) estimation, the multi-stage strategies may violate the underlying assumption of TDA that false matches are distributed uniformly across the target and decoy databases. This violation occurs if the numbers of target and decoy proteins selected for the second search are not equal. Here, we propose a method of decoy database generation based on the previously reported decoy fusion strategy. This method allows unbiased TDA-based FDR estimation in multi-stage searches and can be easily integrated into existing workflows utilizing popular search engines and post-search algorithms.
Integrating Behavioral Health in Primary Care Using Lean Workflow Analysis: A Case Study.
van Eeghen, Constance; Littenberg, Benjamin; Holman, Melissa D; Kessler, Rodger
2016-01-01
Primary care offices are integrating behavioral health (BH) clinicians into their practices. Implementing such a change is complex, difficult, and time consuming. Lean workflow analysis may be an efficient, effective, and acceptable method for use during integration. The objectives of this study were to observe BH integration into primary care and to measure its impact. This was a prospective, mixed-methods case study in a primary care practice that served 8,426 patients over a 17-month period, with 652 patients referred to BH services. Secondary measures included primary care visits resulting in BH referrals, referrals resulting in scheduled appointments, time from referral to the scheduled appointment, and time from the referral to the first visit. Providers and staff were surveyed on the Lean method. Referrals increased from 23 to 37 per 1000 visits (P < .001). Referrals resulted in more scheduled (60% to 74%; P < .001) and arrived visits (44% to 53%; P = .025). Time from referral to the first scheduled visit decreased (hazard ratio, 1.60; 95% confidence interval, 1.37-1.88) as did time to first arrived visit (hazard ratio, 1.36; 95% confidence interval, 1.14-1.62). Survey responses and comments were positive. This pilot integration of BH showed significant improvements in treatment initiation and other measures. Strengths of Lean analysis included workflow improvement, system perspective, and project success. Further evaluation is indicated. © Copyright 2016 by the American Board of Family Medicine.
Online time and resource management based on surgical workflow time series analysis.
Maktabi, M; Neumuth, T
2017-02-01
Hospitals' effectiveness and efficiency can be enhanced by automating the resource and time management of the most cost-intensive unit in the hospital: the operating room (OR). The key elements required for the ideal organization of hospital staff and technical resources (such as instruments in the OR) are an exact online forecast of both the surgeon's resource usage and the remaining intervention time. This paper presents a novel online approach relying on time series analysis and the application of a linear time-variant system. We calculated the power spectral density and the spectrogram of surgical perspectives (e.g., used instrument) of interest to compare several surgical workflows. Considering only the use of the surgeon's right hand during an intervention, we were able to predict the remaining intervention time online with an error of 21 min 45 s ±9 min 59 s for lumbar discectomy. Furthermore, the performance of forecasting of technical resource usage in the next 20 min was calculated for a combination of spectral analysis and the application of a linear time-variant system (sensitivity: 74 %; specificity: 75 %) focusing on just the use of surgeon's instrument in question. The outstanding benefit of these methods is that the automated recording of surgical workflows has minimal impact during interventions since the whole set of surgical perspectives need not be recorded. The resulting predictions can help various stakeholders such as OR staff and hospital technicians. Moreover, reducing resource conflicts could well improve patient care.
Purdue ionomics information management system. An integrated functional genomics platform.
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E
2007-02-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Loroch, Stefan; Schommartz, Tim; Brune, Wolfram; Zahedi, René Peiman; Sickmann, Albert
2015-05-01
Quantitative proteomics and phosphoproteomics have become key disciplines in understanding cellular processes. Fundamental research can be done using cell culture providing researchers with virtually infinite sample amounts. In contrast, clinical, pre-clinical and biomedical research is often restricted to minute sample amounts and requires an efficient analysis with only micrograms of protein. To address this issue, we generated a highly sensitive workflow for combined LC-MS-based quantitative proteomics and phosphoproteomics by refining an ERLIC-based 2D phosphoproteomics workflow into an ERLIC-based 3D workflow covering the global proteome as well. The resulting 3D strategy was successfully used for an in-depth quantitative analysis of both, the proteome and the phosphoproteome of murine cytomegalovirus-infected mouse fibroblasts, a model system for host cell manipulation by a virus. In a 2-plex SILAC experiment with 150 μg of a tryptic digest per condition, the 3D strategy enabled the quantification of ~75% more proteins and even ~134% more peptides compared to the 2D strategy. Additionally, we could quantify ~50% more phosphoproteins by non-phosphorylated peptides, concurrently yielding insights into changes on the levels of protein expression and phosphorylation. Beside its sensitivity, our novel three-dimensional ERLIC-strategy has the potential for semi-automated sample processing rendering it a suitable future perspective for clinical, pre-clinical and biomedical research. Copyright © 2015. Published by Elsevier B.V.
SQL Collaborative Learning Framework Based on SOA
NASA Astrophysics Data System (ADS)
Armiati, S.; Awangga, RM
2018-04-01
The research is focused on designing collaborative learning-oriented framework fulfilment service in teaching SQL Oracle 10g. Framework built a foundation of academic fulfilment service performed by a layer of the working unit in collaboration with Program Studi Manajemen Informatika. In the design phase defined what form of collaboration models and information technology proposed for Program Studi Manajemen Informatika by using a framework of collaboration inspired by the stages of modelling a Service Oriented Architecture (SOA). Stages begin with analyzing subsystems, this activity is used to determine subsystem involved and reliance as well as workflow between the subsystems. After the service can be identified, the second phase is designing the component specifications, which details the components that are implemented in the service to include the data, rules, services, profiles can be configured, and variations. The third stage is to allocate service, set the service to the subsystems that have been identified, and its components. Implementation framework contributes to the teaching guides and application architecture that can be used as a landing realize an increase in service by applying information technology.
Disruption of Radiologist Workflow.
Kansagra, Akash P; Liu, Kevin; Yu, John-Paul J
2016-01-01
The effect of disruptions has been studied extensively in surgery and emergency medicine, and a number of solutions-such as preoperative checklists-have been implemented to enforce the integrity of critical safety-related workflows. Disruptions of the highly complex and cognitively demanding workflow of modern clinical radiology have only recently attracted attention as a potential safety hazard. In this article, we describe the variety of disruptions that arise in the reading room environment, review approaches that other specialties have taken to mitigate workflow disruption, and suggest possible solutions for workflow improvement in radiology. Copyright © 2015 Mosby, Inc. All rights reserved.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology
NASA Astrophysics Data System (ADS)
Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm
2015-04-01
Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
USDA-ARS?s Scientific Manuscript database
Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS fi...
Dietel, Manfred; Bubendorf, Lukas; Dingemans, Anne-Marie C; Dooms, Christophe; Elmberger, Göran; García, Rosa Calero; Kerr, Keith M; Lim, Eric; López-Ríos, Fernando; Thunnissen, Erik; Van Schil, Paul E; von Laffert, Maximilian
2016-01-01
Background There is currently no Europe-wide consensus on the appropriate preanalytical measures and workflow to optimise procedures for tissue-based molecular testing of non-small-cell lung cancer (NSCLC). To address this, a group of lung cancer experts (see list of authors) convened to discuss and propose standard operating procedures (SOPs) for NSCLC. Methods Based on earlier meetings and scientific expertise on lung cancer, a multidisciplinary group meeting was aligned. The aim was to include all relevant aspects concerning NSCLC diagnosis. After careful consideration, the following topics were selected and each was reviewed by the experts: surgical resection and sampling; biopsy procedures for analysis; preanalytical and other variables affecting quality of tissue; tissue conservation; testing procedures for epidermal growth factor receptor, anaplastic lymphoma kinase and ROS proto-oncogene 1, receptor tyrosine kinase (ROS1) in lung tissue and cytological specimens; as well as standardised reporting and quality control (QC). Finally, an optimal workflow was described. Results Suggested optimal procedures and workflows are discussed in detail. The broad consensus was that the complex workflow presented can only be executed effectively by an interdisciplinary approach using a well-trained team. Conclusions To optimise diagnosis and treatment of patients with NSCLC, it is essential to establish SOPs that are adaptable to the local situation. In addition, a continuous QC system and a local multidisciplinary tumour-type-oriented board are essential. PMID:26530085
NASA Astrophysics Data System (ADS)
Thomer, A.
2017-12-01
Data provenance - the record of the varied processes that went into the creation of a dataset, as well as the relationships between resulting data objects - is necessary to support the reusability, reproducibility and reliability of earth science data. In sUAS-based research, capturing provenance can be particularly challenging because of the breadth and distributed nature of the many platforms used to collect, process and analyze data. In any given project, multiple drones, controllers, computers, software systems, sensors, cameras, imaging processing algorithms and data processing workflows are used over sometimes long periods of time. These platforms and processing result in dozens - if not hundreds - of data products in varying stages of readiness-for-analysis and sharing. Provenance tracking mechanisms are needed to make the relationships between these many data products explicit, and therefore more reusable and shareable. In this talk, I discuss opportunities and challenges in tracking provenance in sUAS-based research, and identify gaps in current workflow-capture technologies. I draw on prior work conducted as part of the IMLS-funded Site-Based Data Curation project in which we developed methods of documenting in and ex silico (that is, computational and non-computation) workflows, and demonstrate this approaches applicability to research with sUASes. I conclude with a discussion of ontologies and other semantic technologies that have potential application in sUAS research.
Deploying and sharing U-Compare workflows as web services.
Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia
2013-02-18
U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services
2013-01-01
Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017
Johnson, Kevin B; Lorenzi, Nancy M
2011-01-01
Objective The goal of this study was to develop an in-depth understanding of how a health information exchange (HIE) fits into clinical workflow at multiple clinical sites. Materials and Methods The ethnographic qualitative study was conducted over a 9-month period in six emergency departments (ED) and eight ambulatory clinics in Memphis, Tennessee, USA. Data were collected using direct observation, informal interviews during observation, and formal semi-structured interviews. The authors observed for over 180 h, during which providers used the exchange 130 times. Results HIE-related workflow was modeled for each ED site and ambulatory clinic group and substantial site-to-site workflow differences were identified. Common patterns in HIE-related workflow were also identified across all sites, leading to the development of two role-based workflow models: nurse based and physician based. The workflow elements framework was applied to the two role-based patterns. An in-depth description was developed of how providers integrated HIE into existing clinical workflow, including prompts for HIE use. Discussion Workflow differed substantially among sites, but two general role-based HIE usage models were identified. Although providers used HIE to improve continuity of patient care, patient–provider trust played a significant role. Types of information retrieved related to roles, with nurses seeking to retrieve recent hospitalization data and more open-ended usage by nurse practitioners and physicians. User and role-specific customization to accommodate differences in workflow and information needs may increase the adoption and use of HIE. Conclusion Understanding end users' perspectives towards HIE technology is crucial to the long-term success of HIE. By applying qualitative methods, an in-depth understanding of HIE usage was developed. PMID:22003156
Provenance-Powered Automatic Workflow Generation and Composition
NASA Astrophysics Data System (ADS)
Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.
2015-12-01
In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
Support for Taverna workflows in the VPH-Share cloud platform.
Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F
2017-07-01
To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
Identifying impact of software dependencies on replicability of biomedical workflows.
Miksa, Tomasz; Rauber, Andreas; Mina, Eleni
2016-12-01
Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright © 2016 Elsevier Inc. All rights reserved.
Methylation Integration (Mint) | Informatics Technology for Cancer Research (ITCR)
A comprehensive software pipeline and set of Galaxy tools/workflows for integrative analysis of genome-wide DNA methylation and hydroxymethylation data. Data types can be either bisulfite sequencing and/or pull-down methods.
A Model of Workflow Composition for Emergency Management
NASA Astrophysics Data System (ADS)
Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu
The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
A Framework for Modeling Workflow Execution by an Interdisciplinary Healthcare Team.
Kezadri-Hamiaz, Mounira; Rosu, Daniela; Wilk, Szymon; Kuziemsky, Craig; Michalowski, Wojtek; Carrier, Marc
2015-01-01
The use of business workflow models in healthcare is limited because of insufficient capture of complexities associated with behavior of interdisciplinary healthcare teams that execute healthcare workflows. In this paper we present a novel framework that builds on the well-founded business workflow model formalism and related infrastructures and introduces a formal semantic layer that describes selected aspects of team dynamics and supports their real-time operationalization.
New ergonomic and functional design of digital conferencing rooms in a clinical environment
NASA Astrophysics Data System (ADS)
Ratib, Osman M.; Amato, Carlos L.; McGill, D. Ric; Liu, Brent J.; Balbona, Joseph A.; McCoy, J. Michael
2003-05-01
Clinical conferences and multidisciplinary medical rounds play a major role in patient management and decision-making relying on presentation of variety of documents: films, charts, videotapes, graphs etc. These conferences and clinical rounds are often carried out in conferencing rooms or department libraries that are usually not suitable for presentation of the data in electronic format. In most instances digital projection equipment is added to existing rooms without proper consideration to functional, ergonomic, acoustical, spatial and environmental requirements. Also, in large academic institutions, the conference rooms serve multiple purposes including as classrooms for teaching and education of students and for administrative meetings among managers and staff. In the migration toward a fully digital hospital we elected to analyze the functional requirements and optimize the ergonomic design of conferencing rooms that can accommodate clinical rounds, multidisciplinary reviews, seminars, formal lectures and department meetings. 3D computer simulation was used for better evaluation and analysis of spatial and ergonomic parameters and for gathering opinions and input from users on different design options. A critical component of the design is the understanding of the different workflow and requirements of different types of conferences and presentations that can be carried out in these conference rooms.
Short message service or disService: issues with text messaging in a complex medical environment.
Wu, Robert; Appel, Lora; Morra, Dante; Lo, Vivian; Kitto, Simon; Quan, Sherman
2014-04-01
Hospitals today are experiencing major changes in their clinical communication workflows as conventional numeric paging and face-to-face verbal conversations are being replaced by computer mediated communication systems. In this paper, we highlight the importance of understanding this transition and discuss some of the impacts that may emerge when verbal clinical conversations are replaced by short text messages. In-depth interviews (n=108) and non-participatory observation sessions (n=260h) were conducted on the General Internal Medicine wards at five academic teaching hospitals in Toronto, Canada. From our analysis of the qualitative data, we identified two major themes. De-contextualization of complex issues led to an increase in misinterpretation and an increase in back and forth messaging for clarification. Depersonalization of communication was due to less verbal conversations and face-to-face interactions and led to a negative impact on work relationships. Text-based communication in hospital settings led to the oversimplification of messages and the depersonalization of communication. It is important to recognize and understand these unintended consequences of new technology to avoid the negative impacts to patient care and work relationships. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Design and implementation of a secure workflow system based on PKI/PMI
NASA Astrophysics Data System (ADS)
Yan, Kai; Jiang, Chao-hui
2013-03-01
As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.
MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.
Muth, Thilo; Kohrs, Fabian; Heyer, Robert; Benndorf, Dirk; Rapp, Erdmann; Reichl, Udo; Martens, Lennart; Renard, Bernhard Y
2018-01-02
Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .
Realizing the potential of the CUAHSI Water Data Center to advance Earth Science
NASA Astrophysics Data System (ADS)
Hooper, R. P.; Seul, M.; Pollak, J.; Couch, A.
2015-12-01
The CUAHSI Water Data Center has developed a cloud-based system for data publication, discovery and access. Key features of this system are a semantically enabled catalog to discover data across more than 100 different services and delivery of data and metadata in a standard format. While this represents a significant technical achievement, the purpose of this system is to support data reanalysis for advancing science. A new web-based client, HydroClient, improves access to the data from previous clients. This client is envisioned as the first step in a workflow that can involve visualization and analysis using web-processing services, followed by download to local computers for further analysis. The release of the WaterML library in the R package CRAN repository is an initial attempt at linking the WDC services in a larger analysis workflow. We are seeking community input on other resources required to make the WDC services more valuable in scientific research and education.
Pathview Web: user friendly pathway visualization and data integration
Pant, Gaurav; Bhavnasi, Yeshvant K.; Blanchard, Steven G.; Brouwer, Cory
2017-01-01
Abstract Pathway analysis is widely used in omics studies. Pathway-based data integration and visualization is a critical component of the analysis. To address this need, we recently developed a novel R package called Pathview. Pathview maps, integrates and renders a large variety of biological data onto molecular pathway graphs. Here we developed the Pathview Web server, as to make pathway visualization and data integration accessible to all scientists, including those without the special computing skills or resources. Pathview Web features an intuitive graphical web interface and a user centered design. The server not only expands the core functions of Pathview, but also provides many useful features not available in the offline R package. Importantly, the server presents a comprehensive workflow for both regular and integrated pathway analysis of multiple omics data. In addition, the server also provides a RESTful API for programmatic access and conveniently integration in third-party software or workflows. Pathview Web is openly and freely accessible at https://pathview.uncc.edu/. PMID:28482075
NASA Astrophysics Data System (ADS)
Leibovici, D. G.; Pourabdollah, A.; Jackson, M.
2011-12-01
Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Mishuris, Rebecca Grochow; Yoder, Jordan; Wilson, Dan; Mann, Devin
2016-07-11
Health information is increasingly being digitally stored and exchanged. The public is regularly collecting and storing health-related data on their own electronic devices and in the cloud. Diabetes prevention is an increasingly important preventive health measure, and diet and exercise are key components of this. Patients are turning to online programs to help them lose weight. Despite primary care physicians being important in patients' weight loss success, there is no exchange of information between the primary care provider (PCP) and these online weight loss programs. There is an emerging opportunity to integrate this data directly into the electronic health record (EHR), but little is known about what information to share or how to share it most effectively. This study aims to characterize the preferences of providers concerning the integration of externally generated lifestyle modification data into a primary care EHR workflow. We performed a qualitative study using two rounds of semi-structured interviews with primary care providers. We used an iterative design process involving primary care providers, health information technology software developers and health services researchers to develop the interface. Using grounded-theory thematic analysis 4 themes emerged from the interviews: 1) barriers to establishing healthy lifestyles, 2) features of a lifestyle modification program, 3) reporting of outcomes to the primary care provider, and 4) integration with primary care. These themes guided the rapid-cycle agile design process of an interface of data from an online diabetes prevention program into the primary care EHR workflow. The integration of external health-related data into the EHR must be embedded into the provider workflow in order to be useful to the provider and beneficial for the patient. Accomplishing this requires evaluation of that clinical workflow during software design. The development of this novel interface used rapid cycle iterative design, early involvement by providers, and usability testing methodology. This provides a framework for how to integrate external data into provider workflow in efficient and effective ways. There is now the potential to realize the importance of having this data available in the clinical setting for patient engagement and health outcomes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Novak, A; Nyflot, M; Sponseller, P
2014-06-01
Purpose: Radiation treatment planning involves a complex workflow that can make safety improvement efforts challenging. This study utilizes an incident reporting system to identify detection points of near-miss errors, in order to guide our departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or their patterns. Methods: 1377 incidents were analyzed from a departmental nearmiss error reporting system from 3/2012–10/2013. All incidents were prospectively reviewed weekly by a multi-disciplinary team, and assigned a near-miss severity score ranging from 0–4 reflecting potential harm (no harm to critical). A 98-step consensus workflow was usedmore » to determine origination and detection points of near-miss errors, categorized into 7 major steps (patient assessment/orders, simulation, contouring/treatment planning, pre-treatment plan checks, therapist/on-treatment review, post-treatment checks, and equipment issues). Categories were compared using ANOVA. Results: In the 7-step workflow, 23% of near-miss errors were detected within the same step in the workflow, while an additional 37% were detected by the next step in the workflow, and 23% were detected two steps downstream. Errors detected further from origination were more severe (p<.001; Figure 1). The most common source of near-miss errors was treatment planning/contouring, with 476 near misses (35%). Of those 476, only 72(15%) were found before leaving treatment planning, 213(45%) were found at physics plan checks, and 191(40%) were caught at the therapist pre-treatment chart review or on portal imaging. Errors that passed through physics plan checks and were detected by therapists were more severe than other errors originating in contouring/treatment planning (1.81 vs 1.33, p<0.001). Conclusion: Errors caught by radiation treatment therapists tend to be more severe than errors caught earlier in the workflow, highlighting the importance of safety checks in dosimetry and physics. We are utilizing our findings to improve manual and automated checklists for dosimetry and physics.« less
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.
2013-12-01
As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network
NASA Astrophysics Data System (ADS)
Jones, A. S.; Horsburgh, J. S.
2014-12-01
Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The variety of environmental sensors and the multi-watershed, multi-institutional nature of the network necessitate a well-planned and efficient workflow for acquiring, managing, and sharing sensor data, which should be useful for similar large-scale and long-term networks.
NASA Astrophysics Data System (ADS)
Laban, Shaban; El-Desouky, Aly
2014-05-01
To achieve a rapid, simple and reliable parallel processing of different types of tasks and big data processing on any compute cluster, a lightweight messaging-based distributed applications processing and workflow execution framework model is proposed. The framework is based on Apache ActiveMQ and Simple (or Streaming) Text Oriented Message Protocol (STOMP). ActiveMQ , a popular and powerful open source persistence messaging and integration patterns server with scheduler capabilities, acts as a message broker in the framework. STOMP provides an interoperable wire format that allows framework programs to talk and interact between each other and ActiveMQ easily. In order to efficiently use the message broker a unified message and topic naming pattern is utilized to achieve the required operation. Only three Python programs and simple library, used to unify and simplify the implementation of activeMQ and STOMP protocol, are needed to use the framework. A watchdog program is used to monitor, remove, add, start and stop any machine and/or its different tasks when necessary. For every machine a dedicated one and only one zoo keeper program is used to start different functions or tasks, stompShell program, needed for executing the user required workflow. The stompShell instances are used to execute any workflow jobs based on received message. A well-defined, simple and flexible message structure, based on JavaScript Object Notation (JSON), is used to build any complex workflow systems. Also, JSON format is used in configuration, communication between machines and programs. The framework is platform independent. Although, the framework is built using Python the actual workflow programs or jobs can be implemented by any programming language. The generic framework can be used in small national data centres for processing seismological and radionuclide data received from the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). Also, it is possible to extend the use of the framework in monitoring the IDC pipeline. The detailed design, implementation,conclusion and future work of the proposed framework will be presented.
Zhao, Jun; Avila-Garcia, Maria Susana; Roos, Marco; Thompson, Mark; van der Horst, Eelke; Kaliyaperumal, Rajaram; Luo, Ruibang; Lee, Tin-Lap; Lam, Tak-wah; Edmunds, Scott C.; Sansone, Susanna-Assunta
2015-01-01
Motivation Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. Contact: philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk. PMID:26154165
NASA Astrophysics Data System (ADS)
See, Linda; Perger, Christoph; Dresel, Christopher; Hofer, Martin; Weichselbaum, Juergen; Mondel, Thomas; Steffen, Fritz
2016-04-01
The validation of land cover products is an important step in the workflow of generating a land cover map from remotely-sensed imagery. Many students of remote sensing will be given exercises on classifying a land cover map followed by the validation process. Many algorithms exist for classification, embedded within proprietary image processing software or increasingly as open source tools. However, there is little standardization for land cover validation, nor a set of open tools available for implementing this process. The LACO-Wiki tool was developed as a way of filling this gap, bringing together standardized land cover validation methods and workflows into a single portal. This includes the storage and management of land cover maps and validation data; step-by-step instructions to guide users through the validation process; sound sampling designs; an easy-to-use environment for validation sample interpretation; and the generation of accuracy reports based on the validation process. The tool was developed for a range of users including producers of land cover maps, researchers, teachers and students. The use of such a tool could be embedded within the curriculum of remote sensing courses at a university level but is simple enough for use by students aged 13-18. A beta version of the tool is available for testing at: http://www.laco-wiki.net.
Grid-based platform for training in Earth Observation
NASA Astrophysics Data System (ADS)
Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Frincu, Marc; Neagul, Marian; Gorgan, Dorian; Stefanut, Teodor
2010-05-01
GiSHEO platform [1] providing on-demand services for training and high education in Earth Observation is developed, in the frame of an ESA funded project through its PECS programme, to respond to the needs of powerful education resources in remote sensing field. It intends to be a Grid-based platform of which potential for experimentation and extensibility are the key benefits compared with a desktop software solution. Near-real time applications requiring simultaneous multiple short-time-response data-intensive tasks, as in the case of a short time training event, are the ones that are proved to be ideal for this platform. The platform is based on Globus Toolkit 4 facilities for security and process management, and on the clusters of four academic institutions involved in the project. The authorization uses a VOMS service. The main public services are the followings: the EO processing services (represented through special WSRF-type services); the workflow service exposing a particular workflow engine; the data indexing and discovery service for accessing the data management mechanisms; the processing services, a collection allowing easy access to the processing platform. The WSRF-type services for basic satellite image processing are reusing free image processing tools, OpenCV and GDAL. New algorithms and workflows were develop to tackle with challenging problems like detecting the underground remains of old fortifications, walls or houses. More details can be found in [2]. Composed services can be specified through workflows and are easy to be deployed. The workflow engine, OSyRIS (Orchestration System using a Rule based Inference Solution), is based on DROOLS, and a new rule-based workflow language, SILK (SImple Language for worKflow), has been built. Workflow creation in SILK can be done with or without a visual designing tools. The basics of SILK are the tasks and relations (rules) between them. It is similar with the SCUFL language, but not relying on XML in order to allow the introduction of more workflow specific issues. Moreover, an event-condition-action (ECA) approach allows a greater flexibility when expressing data and task dependencies, as well as the creation of adaptive workflows which can react to changes in the configuration of the Grid or in the workflow itself. Changes inside the grid are handled by creating specific rules which allow resource selection based on various task scheduling criteria. Modifications of the workflow are usually accomplished either by inserting or retracting at runtime rules belonging to it or by modifying the executor of the task in case a better one is found. The former implies changes in its structure while the latter does not necessarily mean changes of the resource but more precisely changes of the algorithm used for solving the task. More details can be found in [3]. Another important platform component is the data indexing and storage service, GDIS, providing features for data storage, indexing data using a specialized RDBMS, finding data by various conditions, querying external services and keeping track of temporary data generated by other components. The data storage component part of GDIS is responsible for storing the data by using available storage backends such as local disk file systems (ext3), local cluster storage (GFS) or distributed file systems (HDFS). A front-end GridFTP service is capable of interacting with the storage domains on behalf of the clients and in a uniform way and also enforces the security restrictions provided by other specialized services and related with data access. The data indexing is performed by PostGIS. An advanced and flexible interface for searching the project's geographical repository is built around a custom query language (LLQL - Lisp Like Query Language) designed to provide fine grained access to the data in the repository and to query external services (e.g. for exploiting the connection with GENESI-DR catalog). More details can be found in [4]. The Workload Management System (WMS) provides two types of resource managers. The first one will be based on Condor HTC and use Condor as a job manager for task dispatching and working nodes (for development purposes) while the second one will use GT4 GRAM (for production purposes). The WMS main component, the Grid Task Dispatcher (GTD), is responsible for the interaction with other internal services as the composition engine in order to facilitate access to the processing platform. Its main responsibilities are to receive tasks from the workflow engine or directly from user interface, to use a task description language (the ClassAd meta language in case of Condor HTC) for job units, to submit and check the status of jobs inside the workload management system and to retrieve job logs for debugging purposes. More details can be found in [4]. A particular component of the platform is eGLE, the eLearning environment. It provides the functionalities necessary to create the visual appearance of the lessons through the usage of visual containers like tools, patterns and templates. The teacher uses the platform for testing the already created lessons, as well as for developing new lesson resources, such as new images and workflows describing graph-based processing. The students execute the lessons or describe and experiment with new workflows or different data. The eGLE database includes several workflow-based lesson descriptions, teaching materials and lesson resources, selected satellite and spatial data. More details can be found in [5]. A first training event of using the platform was organized in September 2009 during 11th SYNASC symposium (links to the demos, testing interface, and exercises are available on project site [1]). The eGLE component was presented at 4th GPC conference in May 2009. Moreover, the functionality of the platform will be presented as demo in April 2010 at 5th EGEE User forum. References: [1] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [2] D. Petcu, D. Zaharie, M. Neagul, S. Panica, M. Frincu, D. Gorgan, T. Stefanut, V. Bacu, Remote Sensed Image Processing on Grids for Training in Earth Observation. In Image Processing, V. Kordic (ed.), In-Tech, January 2010. [3] M. Neagul, S. Panica, D. Petcu, D. Zaharie, D. Gorgan, Web and Grid Services for Training in Earth Observation, IDAACS 2009, IEEE Computer Press, 241-246 [4] M. Frincu, S. Panica, M. Neagul, D. Petcu, Gisheo: On Demand Grid Service Based Platform for EO Data Processing. HiperGrid 2009, Politehnica Press, 415-422. [5] D. Gorgan, T. Stefanut, V. Bacu, Grid Based Training Environment for Earth Observation, GPC 2009, LNCS 5529, 98-109
USDA-ARS?s Scientific Manuscript database
Using five centimeter resolution images acquired with an unmanned aircraft system (UAS), we developed and evaluated an image processing workflow that included the integration of resolution-appropriate field sampling, feature selection, object-based image analysis, and processing approaches for UAS i...
Jeremy S. Fried; Larry D. Potts; Sara M. Loreno; Glenn A. Christensen; R. Jamie Barbour
2017-01-01
The Forest Inventory and Analysis (FIA)-based BioSum (Bioregional Inventory Originated Simulation Under Management) is a free policy analysis framework and workflow management software solution. It addresses complex management questions concerning forest health and vulnerability for large, multimillion acre, multiowner landscapes using FIA plot data as the initial...
Worklist handling in workflow-enabled radiological application systems
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens
2000-05-01
For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.
Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja
2015-01-01
Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437
VisRseq: R-based visual framework for analysis of sequencing data
2015-01-01
Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. Conclusions To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights. PMID:26328469
VisRseq: R-based visual framework for analysis of sequencing data.
Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M
2015-01-01
Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.
Droplet microfluidics for single-cell analysis.
Brouzes, Eric
2012-01-01
This book chapter aims at providing an overview of all the aspects and procedures needed to develop a droplet-based workflow for single-cell analysis (see Fig. 10.1). The surfactant system used to stabilize droplets is a critical component of droplet microfluidics; its properties define the type of droplet-based assays and workflows that can be developed. The scope of this book chapter is limited to fluorinated surfactant systems that have proved to generate extremely stable droplets and allow to easily retrieve the encapsulated material. The formulation section discusses how the experimental parameters influence the choice of the surfactant system to use. The circuit design section presents recipes to design and integrate different droplet modules into a whole assay. The fabrication section describes the manufacturing of microfluidic chip including the surface treatment which is pivotal in droplet microfluidics. Finally, the last section reviews the experimental setup for fluorescence detection with an emphasis on cell injection and incubation.
Hamzeiy, Hamid; Cox, Jürgen
2017-02-01
Computational workflows for mass spectrometry-based shotgun proteomics and untargeted metabolomics share many steps. Despite the similarities, untargeted metabolomics is lagging behind in terms of reliable fully automated quantitative data analysis. We argue that metabolomics will strongly benefit from the adaptation of successful automated proteomics workflows to metabolomics. MaxQuant is a popular platform for proteomics data analysis and is widely considered to be superior in achieving high precursor mass accuracies through advanced nonlinear recalibration, usually leading to five to ten-fold better accuracy in complex LC-MS/MS runs. This translates to a sharp decrease in the number of peptide candidates per measured feature, thereby strongly improving the coverage of identified peptides. We argue that similar strategies can be applied to untargeted metabolomics, leading to equivalent improvements in metabolite identification. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Towards Discovery and Targeted Peptide Biomarker Detection Using nanoESI-TIMS-TOF MS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garabedian, Alyssa; Benigni, Paolo; Ramirez, Cesar E.
Abstract. In the present work, the potential of trapped ion mobility spectrometry coupled to TOF mass spectrometry (TIMS-TOF MS) for discovery and targeted monitoring of peptide biomarkers from human-in-mouse xenograft tumor tissue was evaluated. In particular, a TIMS-MS workflow was developed for the detection and quantification of peptide biomarkers using internal heavy analogs, taking advantage of the high mobility resolution (R = 150–250) prior to mass analysis. Five peptide biomarkers were separated, identified, and quantified using offline nanoESI-TIMSCID- TOF MS; the results were in good agreement with measurements using a traditional LC-ESI-MS/MS proteomics workflow. The TIMS-TOF MS analysis permitted peptidemore » biomarker detection based on accurate mobility, mass measurements, and high sequence coverage for concentrations in the 10–200 nM range, while simultaneously achieving discovery measurements« less
Sand, Olivier; Thomas-Chollier, Morgane; Vervisch, Eric; van Helden, Jacques
2008-01-01
This protocol shows how to access the Regulatory Sequence Analysis Tools (RSAT) via a programmatic interface in order to automate the analysis of multiple data sets. We describe the steps for writing a Perl client that connects to the RSAT Web services and implements a workflow to discover putative cis-acting elements in promoters of gene clusters. In the presented example, we apply this workflow to lists of transcription factor target genes resulting from ChIP-chip experiments. For each factor, the protocol predicts the binding motifs by detecting significantly overrepresented hexanucleotides in the target promoters and generates a feature map that displays the positions of putative binding sites along the promoter sequences. This protocol is addressed to bioinformaticians and biologists with programming skills (notions of Perl). Running time is approximately 6 min on the example data set.
Software Project Management and Measurement on the World-Wide-Web (WWW)
NASA Technical Reports Server (NTRS)
Callahan, John; Ramakrishnan, Sudhaka
1996-01-01
We briefly describe a system for forms-based, work-flow management that helps members of a software development team overcome geographical barriers to collaboration. Our system, called the Web Integrated Software Environment (WISE), is implemented as a World-Wide-Web service that allows for management and measurement of software development projects based on dynamic analysis of change activity in the workflow. WISE tracks issues in a software development process, provides informal communication between the users with different roles, supports to-do lists, and helps in software process improvement. WISE minimizes the time devoted to metrics collection and analysis by providing implicit delivery of messages between users based on the content of project documents. The use of a database in WISE is hidden from the users who view WISE as maintaining a personal 'to-do list' of tasks related to the many projects on which they may play different roles.
New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data
NASA Astrophysics Data System (ADS)
Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.
2007-12-01
High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.
Integrating advanced visualization technology into the planetary Geoscience workflow
NASA Astrophysics Data System (ADS)
Huffman, John; Forsberg, Andrew; Loomis, Andrew; Head, James; Dickson, James; Fassett, Caleb
2011-09-01
Recent advances in computer visualization have allowed us to develop new tools for analyzing the data gathered during planetary missions, which is important, since these data sets have grown exponentially in recent years to tens of terabytes in size. As part of the Advanced Visualization in Solar System Exploration and Research (ADVISER) project, we utilize several advanced visualization techniques created specifically with planetary image data in mind. The Geoviewer application allows real-time active stereo display of images, which in aggregate have billions of pixels. The ADVISER desktop application platform allows fast three-dimensional visualization of planetary images overlain on digital terrain models. Both applications include tools for easy data ingest and real-time analysis in a programmatic manner. Incorporation of these tools into our everyday scientific workflow has proved important for scientific analysis, discussion, and publication, and enabled effective and exciting educational activities for students from high school through graduate school.
IceProd 2: A Next Generation Data Analysis Framework for the IceCube Neutrino Observatory
NASA Astrophysics Data System (ADS)
Schultz, D.
2015-12-01
We describe the overall structure and new features of the second generation of IceProd, a data processing and management framework. IceProd was developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and analysis levels. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd is designed to be very light-weight; it runs as a python application fully in user space and can be set up easily. For the initial completion of this second version of IceProd, improvements have been made to increase security, reliability, scalability, and ease of use.
A Domain Analysis Model for eIRB Systems: Addressing the Weak Link in Clinical Research Informatics
He, Shan; Narus, Scott P.; Facelli, Julio C.; Lau, Lee Min; Botkin, Jefferey R.; Hurdle, John F.
2014-01-01
Institutional Review Boards (IRBs) are a critical component of clinical research and can become a significant bottleneck due to the dramatic increase, in both volume and complexity of clinical research. Despite the interest in developing clinical research informatics (CRI) systems and supporting data standards to increase clinical research efficiency and interoperability, informatics research in the IRB domain has not attracted much attention in the scientific community. The lack of standardized and structured application forms across different IRBs causes inefficient and inconsistent proposal reviews and cumbersome workflows. These issues are even more prominent in multi-institutional clinical research that is rapidly becoming the norm. This paper proposes and evaluates a domain analysis model for electronic IRB (eIRB) systems, paving the way for streamlined clinical research workflow via integration with other CRI systems and improved IRB application throughput via computer-assisted decision support. PMID:24929181
Sciutto, Giorgia; Oliveri, Paolo; Catelli, Emilio; Bonacini, Irene
2017-01-01
In the field of applied researches in heritage science, the use of multivariate approach is still quite limited and often chemometric results obtained are often underinterpreted. Within this scenario, the present paper is aimed at disseminating the use of suitable multivariate methodologies and proposes a procedural workflow applied on a representative group of case studies, of considerable importance for conservation purposes, as a sort of guideline on the processing and on the interpretation of this FTIR data. Initially, principal component analysis (PCA) is performed and the score values are converted into chemical maps. Successively, the brushing approach is applied, demonstrating its usefulness for a deep understanding of the relationships between the multivariate map and PC score space, as well as for the identification of the spectral bands mainly involved in the definition of each area localised within the score maps. PMID:29333162
Implementing bioinformatic workflows within the bioextract server
USDA-ARS?s Scientific Manuscript database
Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.
Hartman, Douglas J
2015-06-01
Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.
Hartman, Douglas J
2016-03-01
Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.
Camporese, Alessandro
2004-06-01
The diagnosis of infectious diseases and the role of the microbiology laboratory are currently undergoing a process of change. The need for overall efficiency in providing results is now given the same importance as accuracy. This means that laboratories must be able to produce quality results in less time with the capacity to interpret the results clinically. To improve the clinical impact of microbiology results, the new challenge facing the microbiologist has become one of process management instead of pure analysis. A proper project management process designed to improve workflow, reduce analytical time, and provide the same high quality results without losing valuable time treating the patient, has become essential. Our objective was to study the impact of introducing automation and computerization into the microbiology laboratory, and the reorganization of the laboratory workflow, i.e. scheduling personnel to work shifts covering both the entire day and the entire week. In our laboratory, the introduction of automation and computerization, as well as the reorganization of personnel, thus the workflow itself, has resulted in an improvement in response time and greater efficiency in diagnostic procedures.
Automation and workflow considerations for embedding Digimarc Barcodes at scale
NASA Astrophysics Data System (ADS)
Rodriguez, Tony; Haaga, Don; Calhoon, Sean
2015-03-01
The Digimarc® Barcode is a digital watermark applied to packages and variable data labels that carries GS1 standard GTIN-14 data traditionally carried by a 1-D barcode. The Digimarc Barcode can be read with smartphones and imaging-based barcode readers commonly used in grocery and retail environments. Using smartphones, consumers can engage with products and retailers can materially increase the speed of check-out, increasing store margins and providing a better experience for shoppers. Internal testing has shown an average of 53% increase in scanning throughput, enabling 100's of millions of dollars in cost savings [1] for retailers when deployed at scale. To get to scale, the process of embedding a digital watermark must be automated and integrated within existing workflows. Creating the tools and processes to do so represents a new challenge for the watermarking community. This paper presents a description and an analysis of the workflow implemented by Digimarc to deploy the Digimarc Barcode at scale. An overview of the tools created and lessons learned during the introduction of technology to the market are provided.
3D Printing of CT Dataset: Validation of an Open Source and Consumer-Available Workflow.
Bortolotto, Chandra; Eshja, Esmeralda; Peroni, Caterina; Orlandi, Matteo A; Bizzotto, Nicola; Poggi, Paolo
2016-02-01
The broad availability of cheap three-dimensional (3D) printing equipment has raised the need for a thorough analysis on its effects on clinical accuracy. Our aim is to determine whether the accuracy of 3D printing process is affected by the use of a low-budget workflow based on open source software and consumer's commercially available 3D printers. A group of test objects was scanned with a 64-slice computed tomography (CT) in order to build their 3D copies. CT datasets were elaborated using a software chain based on three free and open source software. Objects were printed out with a commercially available 3D printer. Both the 3D copies and the test objects were measured using a digital professional caliper. Overall, the objects' mean absolute difference between test objects and 3D copies is 0.23 mm and the mean relative difference amounts to 0.55 %. Our results demonstrate that the accuracy of 3D printing process remains high despite the use of a low-budget workflow.