NASA Astrophysics Data System (ADS)
Pan, Tianheng
2018-01-01
In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.
Wireless remote control clinical image workflow: utilizing a PDA for offsite distribution
NASA Astrophysics Data System (ADS)
Liu, Brent J.; Documet, Luis; Documet, Jorge; Huang, H. K.; Muldoon, Jean
2004-04-01
Last year we presented in RSNA an application to perform wireless remote control of PACS image distribution utilizing a handheld device such as a Personal Digital Assistant (PDA). This paper describes the clinical experiences including workflow scenarios of implementing the PDA application to route exams from the clinical PACS archive server to various locations for offsite distribution of clinical PACS exams. By utilizing this remote control application, radiologists can manage image workflow distribution with a single wireless handheld device without impacting their clinical workflow on diagnostic PACS workstations. A PDA application was designed and developed to perform DICOM Query and C-Move requests by a physician from a clinical PACS Archive to a CD-burning device for automatic burning of PACS data for the distribution to offsite. In addition, it was also used for convenient routing of historical PACS exams to the local web server, local workstations, and teleradiology systems. The application was evaluated by radiologists as well as other clinical staff who need to distribute PACS exams to offsite referring physician"s offices and offsite radiologists. An application for image workflow management utilizing wireless technology was implemented in a clinical environment and evaluated. A PDA application was successfully utilized to perform DICOM Query and C-Move requests from the clinical PACS archive to various offsite exam distribution devices. Clinical staff can utilize the PDA to manage image workflow and PACS exam distribution conveniently for offsite consultations by referring physicians and radiologists. This solution allows the radiologist to expand their effectiveness in health care delivery both within the radiology department as well as offisite by improving their clinical workflow.
Design and implementation of workflow engine for service-oriented architecture
NASA Astrophysics Data System (ADS)
Peng, Shuqing; Duan, Huining; Chen, Deyun
2009-04-01
As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
NASA Astrophysics Data System (ADS)
Tomlin, M. C.; Jenkyns, R.
2015-12-01
Ocean Networks Canada (ONC) collects data from observatories in the northeast Pacific, Salish Sea, Arctic Ocean, Atlantic Ocean, and land-based sites in British Columbia. Data are streamed, collected autonomously, or transmitted via satellite from a variety of instruments. The Software Engineering group at ONC develops and maintains Oceans 2.0, an in-house software system that acquires and archives data from sensors, and makes data available to scientists, the public, government and non-government agencies. The Oceans 2.0 workflow tool was developed by ONC to manage a large volume of tasks and processes required for instrument installation, recovery and maintenance activities. Since 2013, the workflow tool has supported 70 expeditions and grown to include 30 different workflow processes for the increasing complexity of infrastructures at ONC. The workflow tool strives to keep pace with an increasing heterogeneity of sensors, connections and environments by supporting versioning of existing workflows, and allowing the creation of new processes and tasks. Despite challenges in training and gaining mutual support from multidisciplinary teams, the workflow tool has become invaluable in project management in an innovative setting. It provides a collective place to contribute to ONC's diverse projects and expeditions and encourages more repeatable processes, while promoting interactions between the multidisciplinary teams who manage various aspects of instrument development and the data they produce. The workflow tool inspires documentation of terminologies and procedures, and effectively links to other tools at ONC such as JIRA, Alfresco and Wiki. Motivated by growing sensor schemes, modes of collecting data, archiving, and data distribution at ONC, the workflow tool ensures that infrastructure is managed completely from instrument purchase to data distribution. It integrates all areas of expertise and helps fulfill ONC's mandate to offer quality data to users.
Virtual Sensor Web Architecture
NASA Astrophysics Data System (ADS)
Bose, P.; Zimdars, A.; Hurlburt, N.; Doug, S.
2006-12-01
NASA envisions the development of smart sensor webs, intelligent and integrated observation network that harness distributed sensing assets, their associated continuous and complex data sets, and predictive observation processing mechanisms for timely, collaborative hazard mitigation and enhanced science productivity and reliability. This paper presents Virtual Sensor Web Infrastructure for Collaborative Science (VSICS) Architecture for sustained coordination of (numerical and distributed) model-based processing, closed-loop resource allocation, and observation planning. VSICS's key ideas include i) rich descriptions of sensors as services based on semantic markup languages like OWL and SensorML; ii) service-oriented workflow composition and repair for simple and ensemble models; event-driven workflow execution based on event-based and distributed workflow management mechanisms; and iii) development of autonomous model interaction management capabilities providing closed-loop control of collection resources driven by competing targeted observation needs. We present results from initial work on collaborative science processing involving distributed services (COSEC framework) that is being extended to create VSICS.
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi
2018-03-19
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
Workflow management in large distributed systems
NASA Astrophysics Data System (ADS)
Legrand, I.; Newman, H.; Voicu, R.; Dobre, C.; Grigoras, C.
2011-12-01
The MonALISA (Monitoring Agents using a Large Integrated Services Architecture) framework provides a distributed service system capable of controlling and optimizing large-scale, data-intensive applications. An essential part of managing large-scale, distributed data-processing facilities is a monitoring system for computing facilities, storage, networks, and the very large number of applications running on these systems in near realtime. All this monitoring information gathered for all the subsystems is essential for developing the required higher-level services—the components that provide decision support and some degree of automated decisions—and for maintaining and optimizing workflow in large-scale distributed systems. These management and global optimization functions are performed by higher-level agent-based services. We present several applications of MonALISA's higher-level services including optimized dynamic routing, control, data-transfer scheduling, distributed job scheduling, dynamic allocation of storage resource to running jobs and automated management of remote services among a large set of grid facilities.
NASA Astrophysics Data System (ADS)
Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.
Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Detecting distant homologies on protozoans metabolic pathways using scientific workflows.
da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta
2010-01-01
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
Wong, Stephen T C; Tjandra, Donny; Wang, Huili; Shen, Weimin
2003-09-01
Few information systems today offer a flexible means to define and manage the automated part of radiology processes, which provide clinical imaging services for the entire healthcare organization. Even fewer of them provide a coherent architecture that can easily cope with heterogeneity and inevitable local adaptation of applications and can integrate clinical and administrative information to aid better clinical, operational, and business decisions. We describe an innovative enterprise architecture of image information management systems to fill the needs. Such a system is based on the interplay of production workflow management, distributed object computing, Java and Web techniques, and in-depth domain knowledge in radiology operations. Our design adapts the approach of "4+1" architectural view. In this new architecture, PACS and RIS become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, the workflow-enabled digital radiology system would provide powerful query and statistical functions for managing resources and improving productivity. This paper will potentially lead to a new direction of image information management. We illustrate the innovative design with examples taken from an implemented system.
GUEST EDITOR'S INTRODUCTION: Guest Editor's introduction
NASA Astrophysics Data System (ADS)
Chrysanthis, Panos K.
1996-12-01
Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA This special issue focuses on current efforts to represent and support workflows that integrate information systems and human resources within a business or manufacturing enterprise. Workflows may also be viewed as an emerging computational paradigm for effective structuring of cooperative applications involving human users and access to diverse data types not necessarily maintained by traditional database management systems. A workflow is an automated organizational process (also called business process) which consists of a set of activities or tasks that need to be executed in a particular controlled order over a combination of heterogeneous database systems and legacy systems. Within workflows, tasks are performed cooperatively by either human or computational agents in accordance with their roles in the organizational hierarchy. The challenge in facilitating the implementation of workflows lies in developing efficient workflow management systems. A workflow management system (also called workflow server, workflow engine or workflow enactment system) provides the necessary interfaces for coordination and communication among human and computational agents to execute the tasks involved in a workflow and controls the execution orderings of tasks as well as the flow of data that these tasks manipulate. That is, the workflow management system is responsible for correctly and reliably supporting the specification, execution, and monitoring of workflows. The six papers selected (out of the twenty-seven submitted for this special issue of Distributed Systems Engineering) address different aspects of these three functional components of a workflow management system. In the first paper, `Correctness issues in workflow management', Kamath and Ramamritham discuss the important issue of correctness in workflow management that constitutes a prerequisite for the use of workflows in the automation of the critical organizational/business processes. In particular, this paper examines the issues of execution atomicity and failure atomicity, differentiating between correctness requirements of system failures and logical failures, and surveys techniques that can be used to ensure data consistency in workflow management systems. While the first paper is concerned with correctness assuming transactional workflows in which selective transactional properties are associated with individual tasks or the entire workflow, the second paper, `Scheduling workflows by enforcing intertask dependencies' by Attie et al, assumes that the tasks can be either transactions or other activities involving legacy systems. This second paper describes the modelling and specification of conditions involving events and dependencies among tasks within a workflow using temporal logic and finite state automata. It also presents a scheduling algorithm that enforces all stated dependencies by executing at any given time only those events that are allowed by all the dependency automata and in an order as specified by the dependencies. In any system with decentralized control, there is a need to effectively cope with the tension that exists between autonomy and consistency requirements. In `A three-level atomicity model for decentralized workflow management systems', Ben-Shaul and Heineman focus on the specific requirement of enforcing failure atomicity in decentralized, autonomous and interacting workflow management systems. Their paper describes a model in which each workflow manager must be able to specify the sequence of tasks that comprise an atomic unit for the purposes of correctness, and the degrees of local and global atomicity for the purpose of cooperation with other workflow managers. The paper also discusses a realization of this model in which treaties and summits provide an agreement mechanism, while underlying transaction managers are responsible for maintaining failure atomicity. The fourth and fifth papers are experience papers describing a workflow management system and a large scale workflow application, respectively. Schill and Mittasch, in `Workflow management systems on top of OSF DCE and OMG CORBA', describe a decentralized workflow management system and discuss its implementation using two standardized middleware platforms, namely, OSF DCE and OMG CORBA. The system supports a new approach to workflow management, introducing several new concepts such as data type management for integrating various types of data and quality of service for various services provided by servers. A problem common to both database applications and workflows is the handling of missing and incomplete information. This is particularly pervasive in an `electronic market' with a huge number of retail outlets producing and exchanging volumes of data, the application discussed in `Information flow in the DAMA project beyond database managers: information flow managers'. Motivated by the need for a method that allows a task to proceed in a timely manner if not all data produced by other tasks are available by its deadline, Russell et al propose an architectural framework and a language that can be used to detect, approximate and, later on, to adjust missing data if necessary. The final paper, `The evolution towards flexible workflow systems' by Nutt, is complementary to the other papers and is a survey of issues and of work related to both workflow and computer supported collaborative work (CSCW) areas. In particular, the paper provides a model and a categorization of the dimensions which workflow management and CSCW systems share. Besides summarizing the recent advancements towards efficient workflow management, the papers in this special issue suggest areas open to investigation and it is our hope that they will also provide the stimulus for further research and development in the area of workflow management systems.
RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service
NASA Astrophysics Data System (ADS)
Yang, Chao; Chen, Nengcheng; Di, Liping
2012-10-01
Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.
An access control model with high security for distributed workflow and real-time application
NASA Astrophysics Data System (ADS)
Han, Ruo-Fei; Wang, Hou-Xiang
2007-11-01
The traditional mandatory access control policy (MAC) is regarded as a policy with strict regulation and poor flexibility. The security policy of MAC is so compelling that few information systems would adopt it at the cost of facility, except some particular cases with high security requirement as military or government application. However, with the increasing requirement for flexibility, even some access control systems in military application have switched to role-based access control (RBAC) which is well known as flexible. Though RBAC can meet the demands for flexibility but it is weak in dynamic authorization and consequently can not fit well in the workflow management systems. The task-role-based access control (T-RBAC) is then introduced to solve the problem. It combines both the advantages of RBAC and task-based access control (TBAC) which uses task to manage permissions dynamically. To satisfy the requirement of system which is distributed, well defined with workflow process and critically for time accuracy, this paper will analyze the spirit of MAC, introduce it into the improved T&RBAC model which is based on T-RBAC. At last, a conceptual task-role-based access control model with high security for distributed workflow and real-time application (A_T&RBAC) is built, and its performance is simply analyzed.
Wang, Ximing; Liu, Brent J; Martinez, Clarisa; Zhang, Xuejun; Winstein, Carolee J
2015-01-01
Imaging based clinical trials can benefit from a solution to efficiently collect, analyze, and distribute multimedia data at various stages within the workflow. Currently, the data management needs of these trials are typically addressed with custom-built systems. However, software development of the custom- built systems for versatile workflows can be resource-consuming. To address these challenges, we present a system with a workflow engine for imaging based clinical trials. The system enables a project coordinator to build a data collection and management system specifically related to study protocol workflow without programming. Web Access to DICOM Objects (WADO) module with novel features is integrated to further facilitate imaging related study. The system was initially evaluated by an imaging based rehabilitation clinical trial. The evaluation shows that the cost of the development of system can be much reduced compared to the custom-built system. By providing a solution to customize a system and automate the workflow, the system will save on development time and reduce errors especially for imaging clinical trials. PMID:25870169
Web-Accessible Scientific Workflow System for Performance Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roelof Versteeg; Roelof Versteeg; Trevor Rowe
2006-03-01
We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
A virtual data language and system for scientific workflow management in data grid environments
NASA Astrophysics Data System (ADS)
Zhao, Yong
With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.
Biowep: a workflow enactment portal for bioinformatics applications.
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-03-08
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-01-01
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
A patient workflow management system built on guidelines.
Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.
1997-01-01
To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606
NASA Astrophysics Data System (ADS)
Ferreira da Silva, R.; Filgueira, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present Asterism, an open source data-intensive framework, which combines the Pegasus and dispel4py workflow systems. Asterism aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. Asterism's key element is to leverage the strengths of each workflow system: dispel4py allows developing scientific applications locally and then automatically parallelize and scale them on a wide range of HPC infrastructures with no changes to the application's code; Pegasus orchestrates the distributed execution of applications while providing portability, automated data management, recovery, debugging, and monitoring, without users needing to worry about the particulars of the target execution systems. Asterism leverages the level of abstractions provided by each workflow system to describe hybrid workflows where no information about the underlying infrastructure is required beforehand. The feasibility of Asterism has been evaluated using the seismic ambient noise cross-correlation application, a common data-intensive analysis pattern used by many seismologists. The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The Asterism workflow is implemented as a Pegasus workflow composed of two tasks (Phase1 and Phase2), where each phase represents a dispel4py workflow. Pegasus tasks describe the in/output data at a logical level, the data dependency between tasks, and the e-Infrastructures and the execution engine to run each dispel4py workflow. We have instantiated the workflow using data from 1000 stations from the IRIS services, and run it across two heterogeneous resources described as Docker containers: MPI (Container2) and Storm (Container3) clusters (Figure 1). Each dispel4py workflow is mapped to a particular execution engine, and data transfers between resources are automatically handled by Pegasus. Asterism is freely available online at http://github.com/dispel4py/pegasus_dispel4py.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Chase Qishi; Zhu, Michelle Mengxia
The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
NASA Astrophysics Data System (ADS)
Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut
2017-04-01
Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).
Kwf-Grid workflow management system for Earth science applications
NASA Astrophysics Data System (ADS)
Tran, V.; Hluchy, L.
2009-04-01
In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
Workflow management systems in radiology
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim
1998-07-01
In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.
VisTrails SAHM: visualization and workflow management for species habitat modeling
Morisette, Jeffrey T.; Jarnevich, Catherine S.; Holcombe, Tracy R.; Talbert, Colin B.; Ignizio, Drew A.; Talbert, Marian; Silva, Claudio; Koop, David; Swanson, Alan; Young, Nicholas E.
2013-01-01
The Software for Assisted Habitat Modeling (SAHM) has been created to both expedite habitat modeling and help maintain a record of the various input data, pre- and post-processing steps and modeling options incorporated in the construction of a species distribution model through the established workflow management and visualization VisTrails software. This paper provides an overview of the VisTrails:SAHM software including a link to the open source code, a table detailing the current SAHM modules, and a simple example modeling an invasive weed species in Rocky Mountain National Park, USA.
Lu, Xinyan
2016-01-01
There is a clear requirement for enhancing laboratory information management during early absorption, distribution, metabolism and excretion (ADME) screening. The application of a commercial laboratory information management system (LIMS) is limited by complexity, insufficient flexibility, high costs and extended timelines. An improved custom in-house LIMS for ADME screening was developed using Excel. All Excel templates were generated through macros and formulae, and information flow was streamlined as much as possible. This system has been successfully applied in task generation, process control and data management, with a reduction in both labor time and human error rates. An Excel-based LIMS can provide a simple, flexible and cost/time-saving solution for improving workflow efficiencies in early ADME screening.
The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change
NASA Astrophysics Data System (ADS)
Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.
2015-12-01
The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.
ASCEM Data Brower (ASCEMDB) v0.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
ROMOSAN, ALEXANDRU
Data management tool designed for the Advanced Simulation Capability for Environmental Management (ASCEM) framework. Distinguishing features of this gateway include: (1) handling of complex geometry data, (2) advance selection mechanism, (3) state of art rendering of spatiotemporal data records, and (4) seamless integration with a distributed workflow engine.
Documet, Jorge; Liu, Brent J; Documet, Luis; Huang, H K
2006-07-01
This paper describes a picture archiving and communication system (PACS) tool based on Web technology that remotely manages medical images between a PACS archive and remote destinations. Successfully implemented in a clinical environment and also demonstrated for the past 3 years at the conferences of various organizations, including the Radiological Society of North America, this tool provides a very practical and simple way to manage a PACS, including off-site image distribution and disaster recovery. The application is robust and flexible and can be used on a standard PC workstation or a Tablet PC, but more important, it can be used with a personal digital assistant (PDA). With a PDA, the Web application becomes a powerful wireless and mobile image management tool. The application's quick and easy-to-use features allow users to perform Digital Imaging and Communications in Medicine (DICOM) queries and retrievals with a single interface, without having to worry about the underlying configuration of DICOM nodes. In addition, this frees up dedicated PACS workstations to perform their specialized roles within the PACS workflow. This tool has been used at Saint John's Health Center in Santa Monica, California, for 2 years. The average number of queries per month is 2,021, with 816 C-MOVE retrieve requests. Clinical staff members can use PDAs to manage image workflow and PACS examination distribution conveniently for off-site consultations by referring physicians and radiologists and for disaster recovery. This solution also improves radiologists' effectiveness and efficiency in health care delivery both within radiology departments and for off-site clinical coverage.
Motion/imagery secure cloud enterprise architecture analysis
NASA Astrophysics Data System (ADS)
DeLay, John L.
2012-06-01
Cloud computing with storage virtualization and new service-oriented architectures brings a new perspective to the aspect of a distributed motion imagery and persistent surveillance enterprise. Our existing research is focused mainly on content management, distributed analytics, WAN distributed cloud networking performance issues of cloud based technologies. The potential of leveraging cloud based technologies for hosting motion imagery, imagery and analytics workflows for DOD and security applications is relatively unexplored. This paper will examine technologies for managing, storing, processing and disseminating motion imagery and imagery within a distributed network environment. Finally, we propose areas for future research in the area of distributed cloud content management enterprises.
Agile parallel bioinformatics workflow management using Pwrake.
Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro
2011-09-08
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake
2011-01-01
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows
NASA Technical Reports Server (NTRS)
McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush
2004-01-01
With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.
NASA Astrophysics Data System (ADS)
Delventhal, D.; Schultz, D.; Diaz Velez, J. C.
2017-10-01
IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and data driven analysis. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd has recently been rewritten to increase its scaling capabilities, handle user analysis workflows together with simulation production, and facilitate the integration with 3rd party scheduling tools. IceProd 2, the second generation of IceProd, has been running in production for several months now. We share our experience setting up the system and things we’ve learned along the way.
A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997
Workflow Challenges of Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Towbin, Alexander J; Roth, Christopher J; Bronkalla, Mark; Cram, Dawn
2016-10-01
With the advent of digital cameras, there has been an explosion in the number of medical specialties using images to diagnose or document disease and guide interventions. In many specialties, these images are not added to the patient's electronic medical record and are not distributed so that other providers caring for the patient can view them. As hospitals begin to develop enterprise imaging strategies, they have found that there are multiple challenges preventing the implementation of systems to manage image capture, image upload, and image management. This HIMSS-SIIM white paper will describe the key workflow challenges related to enterprise imaging and offer suggestions for potential solutions to these challenges.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.
Jimenez, Rafael C; Corpas, Manuel
2013-01-01
Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
Radiology information system: a workflow-based approach.
Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P
2009-09-01
Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.
Patterson, Emily S.; Lowry, Svetlana Z.; Ramaiah, Mala; Gibbons, Michael C.; Brick, David; Calco, Robert; Matton, Greg; Miller, Anne; Makar, Ellen; Ferrer, Jorge A.
2015-01-01
Introduction: Human factors workflow analyses in healthcare settings prior to technology implemented are recommended to improve workflow in ambulatory care settings. In this paper we describe how insights from a workflow analysis conducted by NIST were implemented in a software prototype developed for a Veteran’s Health Administration (VHA) VAi2 innovation project and associated lessons learned. Methods: We organize the original recommendations and associated stages and steps visualized in process maps from NIST and the VA’s lessons learned from implementing the recommendations in the VAi2 prototype according to four stages: 1) before the patient visit, 2) during the visit, 3) discharge, and 4) visit documentation. NIST recommendations to improve workflow in ambulatory care (outpatient) settings and process map representations were based on reflective statements collected during one-hour discussions with three physicians. The development of the VAi2 prototype was conducted initially independently from the NIST recommendations, but at a midpoint in the process development, all of the implementation elements were compared with the NIST recommendations and lessons learned were documented. Findings: Story-based displays and templates with default preliminary order sets were used to support scheduling, time-critical notifications, drafting medication orders, and supporting a diagnosis-based workflow. These templates enabled customization to the level of diagnostic uncertainty. Functionality was designed to support cooperative work across interdisciplinary team members, including shared documentation sessions with tracking of text modifications, medication lists, and patient education features. Displays were customized to the role and included access for consultants and site-defined educator teams. Discussion: Workflow, usability, and patient safety can be enhanced through clinician-centered design of electronic health records. The lessons learned from implementing NIST recommendations to improve workflow in ambulatory care using an EHR provide a first step in moving from a billing-centered perspective on how to maintain accurate, comprehensive, and up-to-date information about a group of patients to a clinician-centered perspective. These recommendations point the way towards a “patient visit management system,” which incorporates broader notions of supporting workload management, supporting flexible flow of patients and tasks, enabling accountable distributed work across members of the clinical team, and supporting dynamic tracking of steps in tasks that have longer time distributions. PMID:26290887
NASA Astrophysics Data System (ADS)
McCarthy, Ann
2006-01-01
The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.
Ergatis: a web interface and scalable software system for bioinformatics workflows
Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.
2010-01-01
Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634
A three-level atomicity model for decentralized workflow management systems
NASA Astrophysics Data System (ADS)
Ben-Shaul, Israel Z.; Heineman, George T.
1996-12-01
A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.
Generic worklist handler for workflow-enabled products
NASA Astrophysics Data System (ADS)
Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas
1999-07-01
Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.
Towards a Unified Architecture for Data-Intensive Seismology in VERCE
NASA Astrophysics Data System (ADS)
Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.
2013-12-01
Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
A Foundation for Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Roth, Christopher J; Lannum, Louis M; Persons, Kenneth R
2016-10-01
Care providers today routinely obtain valuable clinical multimedia with mobile devices, scope cameras, ultrasound, and many other modalities at the point of care. Image capture and storage workflows may be heterogeneous across an enterprise, and as a result, they often are not well incorporated in the electronic health record. Enterprise Imaging refers to a set of strategies, initiatives, and workflows implemented across a healthcare enterprise to consistently and optimally capture, index, manage, store, distribute, view, exchange, and analyze all clinical imaging and multimedia content to enhance the electronic health record. This paper is intended to introduce Enterprise Imaging as an important initiative to clinical and informatics leadership, and outline its key elements of governance, strategy, infrastructure, common multimedia content, acquisition workflows, enterprise image viewers, and image exchange services.
An Auto-management Thesis Program WebMIS Based on Workflow
NASA Astrophysics Data System (ADS)
Chang, Li; Jie, Shi; Weibo, Zhong
An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.
Implementing bioinformatic workflows within the bioextract server
USDA-ARS?s Scientific Manuscript database
Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
NASA Astrophysics Data System (ADS)
Gleason, J. L.; Hillyer, T. N.; Wilkins, J.
2012-12-01
The CERES Science Team integrates data from 5 CERES instruments onboard the Terra, Aqua and NPP missions. The processing chain fuses CERES observations with data from 19 other unique sources. The addition of CERES Flight Model 5 (FM5) onboard NPP, coupled with ground processing system upgrades further emphasizes the need for an automated job-submission utility to manage multiple processing streams concurrently. The operator-driven, legacy-processing approach relied on manually staging data from magnetic tape to limited spinning disk attached to a shared memory architecture system. The migration of CERES production code to a distributed, cluster computing environment with approximately one petabyte of spinning disk containing all precursor input data products facilitates the development of a CERES-specific, automated workflow manager. In the cluster environment, I/O is the primary system resource in contention across jobs. Therefore, system load can be maximized with a throttling workload manager. This poster discusses a Java and Perl implementation of an automated job management tool tailored for CERES processing.
NASA Astrophysics Data System (ADS)
Huang, T.; Alarcon, C.; Quach, N. T.
2014-12-01
Capture, curate, and analysis are the typical activities performed at any given Earth Science data center. Modern data management systems must be adaptable to heterogeneous science data formats, scalable to meet the mission's quality of service requirements, and able to manage the life-cycle of any given science data product. Designing a scalable data management doesn't happen overnight. It takes countless hours of refining, refactoring, retesting, and re-architecting. The Horizon data management and workflow framework, developed at the Jet Propulsion Laboratory, is a portable, scalable, and reusable framework for developing high-performance data management and product generation workflow systems to automate data capturing, data curation, and data analysis activities. The NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC)'s Data Management and Archive System (DMAS) is its core data infrastructure that handles capturing and distribution of hundreds of thousands of satellite observations each day around the clock. DMAS is an application of the Horizon framework. The NASA Global Imagery Browse Services (GIBS) is NASA's Earth Observing System Data and Information System (EOSDIS)'s solution for making high-resolution global imageries available to the science communities. The Imagery Exchange (TIE), an application of the Horizon framework, is a core subsystem for GIBS responsible for data capturing and imagery generation automation to support the EOSDIS' 12 distributed active archive centers and 17 Science Investigator-led Processing Systems (SIPS). This presentation discusses our ongoing effort in refining, refactoring, retesting, and re-architecting the Horizon framework to enable data-intensive science and its applications.
NASA Astrophysics Data System (ADS)
Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.
2012-12-01
The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.
High-volume workflow management in the ITN/FBI system
NASA Astrophysics Data System (ADS)
Paulson, Thomas L.
1997-02-01
The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.
NASA Astrophysics Data System (ADS)
Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.
2007-12-01
Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog
NASA Astrophysics Data System (ADS)
Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely
2014-05-01
Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.
Context-aware workflow management of mobile health applications.
Salden, Alfons; Poortinga, Remco
2006-01-01
We propose a medical application management architecture that allows medical (IT) experts readily designing, developing and deploying context-aware mobile health (m-health) applications or services. In particular, we elaborate on how our application workflow management architecture enables chaining, coordinating, composing, and adapting context-sensitive medical application components such that critical Quality of Service (QoS) and Quality of Context (QoC) requirements typical for m-health applications or services can be met. This functional architectural support requires learning modules for distilling application-critical selection of attention and anticipation models. These models will help medical experts constructing and adjusting on-the-fly m-health application workflows and workflow strategies. We illustrate our context-aware workflow management paradigm for a m-health data delivery problem, in which optimal communication network configurations have to be determined.
Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M
2008-08-08
Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.
Mickelson, Robin S; Unertl, Kim M; Holden, Richard J
2016-10-12
Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.
Integrating prediction, provenance, and optimization into high energy workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schram, M.; Bansal, V.; Friese, R. D.
We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.
NASA Astrophysics Data System (ADS)
Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.
2017-12-01
This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.
Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization
Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley
2015-01-01
Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173
Redesign of Library Workflows: Experimental Models for Electronic Resource Description.
ERIC Educational Resources Information Center
Calhoun, Karen
This paper explores the potential for and progress of a gradual transition from a highly centralized model for cataloging to an iterative, collaborative, and broadly distributed model for electronic resource description. The purpose is to alert library managers to some experiments underway and to help them conceptualize new methods for defining,…
Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure
2016-01-01
Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331
Building asynchronous geospatial processing workflows with web services
NASA Astrophysics Data System (ADS)
Zhao, Peisheng; Di, Liping; Yu, Genong
2012-02-01
Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.
A Model of Workflow Composition for Emergency Management
NASA Astrophysics Data System (ADS)
Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu
The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
Worklist handling in workflow-enabled radiological application systems
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens
2000-05-01
For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.
2013-12-01
As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.
Workflow technology: the new frontier. How to overcome the barriers and join the future.
Shefter, Susan M
2006-01-01
Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.
Common Workflow Service: Standards Based Solution for Managing Operational Processes
NASA Astrophysics Data System (ADS)
Tinio, A. W.; Hollins, G. A.
2017-06-01
The Common Workflow Service is a collaborative and standards-based solution for managing mission operations processes using techniques from the Business Process Management (BPM) discipline. This presentation describes the CWS and its benefits.
IceProd 2: A Next Generation Data Analysis Framework for the IceCube Neutrino Observatory
NASA Astrophysics Data System (ADS)
Schultz, D.
2015-12-01
We describe the overall structure and new features of the second generation of IceProd, a data processing and management framework. IceProd was developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and analysis levels. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd is designed to be very light-weight; it runs as a python application fully in user space and can be set up easily. For the initial completion of this second version of IceProd, improvements have been made to increase security, reliability, scalability, and ease of use.
ERIC Educational Resources Information Center
Li, Wenhao
2011-01-01
Distributed workflow technology has been widely used in modern education and e-business systems. Distributed web applications have shown cross-domain and cooperative characteristics to meet the need of current distributed workflow applications. In this paper, the author proposes a dynamic and adaptive scheduling algorithm PCSA (Pre-Calculated…
An architecture model for multiple disease management information systems.
Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei
2013-04-01
Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.
A data management and publication workflow for a large-scale, heterogeneous sensor network.
Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan
2015-06-01
It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.
Purdue ionomics information management system. An integrated functional genomics platform.
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E
2007-02-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric
2018-01-01
Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
Design and implementation of a secure workflow system based on PKI/PMI
NASA Astrophysics Data System (ADS)
Yan, Kai; Jiang, Chao-hui
2013-03-01
As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.
Davis, Stephen Jerome; Hurtado, Josephine; Nguyen, Rosemary; Huynh, Tran; Lindon, Ivan; Hudnall, Cedric; Bork, Sara
2017-01-01
Background: USP <797> regulatory requirements have mandated that pharmacies improve aseptic techniques and cleanliness of the medication preparation areas. In addition, the Institute for Safe Medication Practices (ISMP) recommends that technology and automation be used as much as possible for preparing and verifying compounded sterile products. Objective: To determine the benefits associated with the implementation of the workflow management system, such as reducing medication preparation and delivery errors, reducing quantity and frequency of medication errors, avoiding costs, and enhancing the organization's decision to move toward positive patient identification (PPID). Methods: At Texas Children's Hospital, data were collected and analyzed from January 2014 through August 2014 in the pharmacy areas in which the workflow management system would be implemented. Data were excluded for September 2014 during the workflow management system oral liquid implementation phase. Data were collected and analyzed from October 2014 through June 2015 to determine whether the implementation of the workflow management system reduced the quantity and frequency of reported medication errors. Data collected and analyzed during the study period included the quantity of doses prepared, number of incorrect medication scans, number of doses discontinued from the workflow management system queue, and the number of doses rejected. Data were collected and analyzed to identify patterns of incorrect medication scans, to determine reasons for rejected medication doses, and to determine the reduction in wasted medications. Results: During the 17-month study period, the pharmacy department dispensed 1,506,220 oral liquid and injectable medication doses. From October 2014 through June 2015, the pharmacy department dispensed 826,220 medication doses that were prepared and checked via the workflow management system. Of those 826,220 medication doses, there were 16 reported incorrect volume errors. The error rate after the implementation of the workflow management system averaged 8.4%, which was a 1.6% reduction. After the implementation of the workflow management system, the average number of reported oral liquid medication and injectable medication errors decreased to 0.4 and 0.2 times per week, respectively. Conclusion: The organization was able to achieve its purpose and goal of improving the provision of quality pharmacy care through optimal medication use and safety by reducing medication preparation errors. Error rates decreased and the workflow processes were streamlined, which has led to seamless operations within the pharmacy department. There has been significant cost avoidance and waste reduction and enhanced interdepartmental satisfaction due to the reduction of reported medication errors.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...
2015-07-14
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Neubert, Sebastian; Göde, Bernd; Gu, Xiangyu; Stoll, Norbert; Thurow, Kerstin
2017-04-01
Modern business process management (BPM) is increasingly interesting for laboratory automation. End-to-end workflow automation and improved top-level systems integration for information technology (IT) and automation systems are especially prominent objectives. With the ISO Standard Business Process Model and Notation (BPMN) 2.X, a system-independent and interdisciplinary accepted graphical process control notation is provided, allowing process analysis, while also being executable. The transfer of BPM solutions to structured laboratory automation places novel demands, for example, concerning the real-time-critical process and systems integration. The article discusses the potential of laboratory execution systems (LESs) for an easier implementation of the business process management system (BPMS) in hierarchical laboratory automation. In particular, complex application scenarios, including long process chains based on, for example, several distributed automation islands and mobile laboratory robots for a material transport, are difficult to handle in BPMSs. The presented approach deals with the displacement of workflow control tasks into life science specialized LESs, the reduction of numerous different interfaces between BPMSs and subsystems, and the simplification of complex process modelings. Thus, the integration effort for complex laboratory workflows can be significantly reduced for strictly structured automation solutions. An example application, consisting of a mixture of manual and automated subprocesses, is demonstrated by the presented BPMS-LES approach.
Modelling and analysis of workflow for lean supply chains
NASA Astrophysics Data System (ADS)
Ma, Jinping; Wang, Kanliang; Xu, Lida
2011-11-01
Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.
NASA Astrophysics Data System (ADS)
Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent
2014-03-01
Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience
Stockton, David B.; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.
Stockton, David B; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
A PDA study management tool (SMT) utilizing wireless broadband and full DICOM viewing capability
NASA Astrophysics Data System (ADS)
Documet, Jorge; Liu, Brent; Zhou, Zheng; Huang, H. K.; Documet, Luis
2007-03-01
During the last 4 years IPI (Image Processing and Informatics) Laboratory has been developing a web-based Study Management Tool (SMT) application that allows Radiologists, Film librarians and PACS-related (Picture Archiving and Communication System) users to dynamically and remotely perform Query/Retrieve operations in a PACS network. The users utilizing a regular PDA (Personal Digital Assistant) can remotely query a PACS archive to distribute any study to an existing DICOM (Digital Imaging and Communications in Medicine) node. This application which has proven to be convenient to manage the Study Workflow [1, 2] has been extended to include a DICOM viewing capability in the PDA. With this new feature, users can take a quick view of DICOM images providing them mobility and convenience at the same time. In addition, we are extending this application to Metropolitan-Area Wireless Broadband Networks. This feature requires Smart Phones that are capable of working as a PDA and have access to Broadband Wireless Services. With the extended application to wireless broadband technology and the preview of DICOM images, the Study Management Tool becomes an even more powerful tool for clinical workflow management.
Nexus: A modular workflow management system for quantum simulation codes
NASA Astrophysics Data System (ADS)
Krogel, Jaron T.
2016-01-01
The management of simulation workflows represents a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantum chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.
Content and Workflow Management for Library Websites: Case Studies
ERIC Educational Resources Information Center
Yu, Holly, Ed.
2005-01-01
Using database-driven web pages or web content management (WCM) systems to manage increasingly diverse web content and to streamline workflows is a commonly practiced solution recognized in libraries today. However, limited library web content management models and funding constraints prevent many libraries from purchasing commercially available…
Pegasus Workflow Management System: Helping Applications From Earth and Space
NASA Astrophysics Data System (ADS)
Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.
2010-12-01
Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved. Astrophysics: The Laser Interferometer Gravitational-Wave Observatory (LIGO) uses Pegasus WMS to search for binary inspiral gravitational waves. A month of LIGO data requires many thousands of jobs, running for days on hundreds of CPUs on the LIGO Data Grid (LDG) and Open Science Grid (OSG). Ocean Temperature Forecast: Researchers at the Jet Propulsion Laboratory are exploring Pegasus WMS to run ocean forecast ensembles of the California coastal region. These models produce a number of daily forecasts for water temperature, salinity, and other measures. Helioseismology: The Solar Dynamics Observatory (SDO) is NASA's most important solar physics mission of this coming decade. Pegasus WMS is being used to analyze the data from SDO, which will be predominantly used to learn about solar magnetic activity and to probe the internal structure and dynamics of the Sun with helioseismology. Bacterial RNA studies: SIPHT is an application in bacterial genomics, which predicts sRNA (small non-coding RNAs)-encoding genes in bacteria. This project currently provides a web-based interface using Pegasus WMS at the backend to facilitate large-scale execution of the workflows on varied resources and provide better notifications of task/workflow completion.
Scientific Workflow Management in Proteomics
de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus
2012-01-01
Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Workflow Automation: A Collective Case Study
ERIC Educational Resources Information Center
Harlan, Jennifer
2013-01-01
Knowledge management has proven to be a sustainable competitive advantage for many organizations. Knowledge management systems are abundant, with multiple functionalities. The literature reinforces the use of workflow automation with knowledge management systems to benefit organizations; however, it was not known if process automation yielded…
Observing System Simulation Experiment (OSSE) for the HyspIRI Spectrometer Mission
NASA Technical Reports Server (NTRS)
Turmon, Michael J.; Block, Gary L.; Green, Robert O.; Hua, Hook; Jacob, Joseph C.; Sobel, Harold R.; Springer, Paul L.; Zhang, Qingyuan
2010-01-01
The OSSE software provides an integrated end-to-end environment to simulate an Earth observing system by iteratively running a distributed modeling workflow based on the HyspIRI Mission, including atmospheric radiative transfer, surface albedo effects, detection, and retrieval for agile exploration of the mission design space. The software enables an Observing System Simulation Experiment (OSSE) and can be used for design trade space exploration of science return for proposed instruments by modeling the whole ground truth, sensing, and retrieval chain and to assess retrieval accuracy for a particular instrument and algorithm design. The OSSE in fra struc ture is extensible to future National Research Council (NRC) Decadal Survey concept missions where integrated modeling can improve the fidelity of coupled science and engineering analyses for systematic analysis and science return studies. This software has a distributed architecture that gives it a distinct advantage over other similar efforts. The workflow modeling components are typically legacy computer programs implemented in a variety of programming languages, including MATLAB, Excel, and FORTRAN. Integration of these diverse components is difficult and time-consuming. In order to hide this complexity, each modeling component is wrapped as a Web Service, and each component is able to pass analysis parameterizations, such as reflectance or radiance spectra, on to the next component downstream in the service workflow chain. In this way, the interface to each modeling component becomes uniform and the entire end-to-end workflow can be run using any existing or custom workflow processing engine. The architecture lets users extend workflows as new modeling components become available, chain together the components using any existing or custom workflow processing engine, and distribute them across any Internet-accessible Web Service endpoints. The workflow components can be hosted on any Internet-accessible machine. This has the advantages that the computations can be distributed to make best use of the available computing resources, and each workflow component can be hosted and maintained by their respective domain experts.
Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi
As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Garcia Blas, Javier; Isaila, Florin
This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.
Nexus: a modular workflow management system for quantum simulation codes
Krogel, Jaron T.
2015-08-24
The management of simulation workflows is a significant task for the individual computational researcher. Automation of the required tasks involved in simulation work can decrease the overall time to solution and reduce sources of human error. A new simulation workflow management system, Nexus, is presented to address these issues. Nexus is capable of automated job management on workstations and resources at several major supercomputing centers. Its modular design allows many quantum simulation codes to be supported within the same framework. Current support includes quantum Monte Carlo calculations with QMCPACK, density functional theory calculations with Quantum Espresso or VASP, and quantummore » chemical calculations with GAMESS. Users can compose workflows through a transparent, text-based interface, resembling the input file of a typical simulation code. A usage example is provided to illustrate the process.« less
Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
A Tool Supporting Collaborative Data Analytics Workflow Design and Management
NASA Astrophysics Data System (ADS)
Zhang, J.; Bao, Q.; Lee, T. J.
2016-12-01
Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.
Lee, Howard; Chapiro, Julius; Schernthaner, Rüdiger; Duran, Rafael; Wang, Zhijun; Gorodetski, Boris; Geschwind, Jean-François; Lin, MingDe
2015-04-01
The objective of this study was to demonstrate that an intra-arterial liver therapy clinical research database system is a more workflow efficient and robust tool for clinical research than a spreadsheet storage system. The database system could be used to generate clinical research study populations easily with custom search and retrieval criteria. A questionnaire was designed and distributed to 21 board-certified radiologists to assess current data storage problems and clinician reception to a database management system. Based on the questionnaire findings, a customized database and user interface system were created to perform automatic calculations of clinical scores including staging systems such as the Child-Pugh and Barcelona Clinic Liver Cancer, and facilitates data input and output. Questionnaire participants were favorable to a database system. The interface retrieved study-relevant data accurately and effectively. The database effectively produced easy-to-read study-specific patient populations with custom-defined inclusion/exclusion criteria. The database management system is workflow efficient and robust in retrieving, storing, and analyzing data. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Innovations in clinical trials informatics.
Summers, Ron; Vyas, Hiten; Dudhal, Nilesh; Doherty, Neil F; Coombs, Crispin R; Hepworth, Mark
2008-01-01
This paper will investigate innovations in information management for use in clinical trials. The application typifies a complex, adaptive, distributed and information-rich environment for which continuous innovation is necessary. Organisational innovation is highlighted as well as the technical innovations in workflow processes and their representation as an integrated set of web services. Benefits realization uncovers further innovations in the business strand of the work undertaken. Following the description of the development of this information management system, the semantic web is postulated as a possible solution to tame the complexity related to information management issues found within clinical trials support systems.
Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.
2007-01-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This report contains papers on the following topics: NREN Security Issues: Policies and Technologies; Layer Wars: Protect the Internet with Network Layer Security; Electronic Commission Management; Workflow 2000 - Electronic Document Authorization in Practice; Security Issues of a UNIX PEM Implementation; Implementing Privacy Enhanced Mail on VMS; Distributed Public Key Certificate Management; Protecting the Integrity of Privacy-enhanced Electronic Mail; Practical Authorization in Large Heterogeneous Distributed Systems; Security Issues in the Truffles File System; Issues surrounding the use of Cryptographic Algorithms and Smart Card Applications; Smart Card Augmentation of Kerberos; and An Overview of the Advanced Smart Card Access Control System.more » Selected papers were processed separately for inclusion in the Energy Science and Technology Database.« less
Distributed Trust Management for Validating SLA Choreographies
NASA Astrophysics Data System (ADS)
Haq, Irfan Ul; Alnemr, Rehab; Paschke, Adrian; Schikuta, Erich; Boley, Harold; Meinel, Christoph
For business workflow automation in a service-enriched environment such as a grid or a cloud, services scattered across heterogeneous Virtual Organizations (VOs) can be aggregated in a producer-consumer manner, building hierarchical structures of added value. In order to preserve the supply chain, the Service Level Agreements (SLAs) corresponding to the underlying choreography of services should also be incrementally aggregated. This cross-VO hierarchical SLA aggregation requires validation, for which a distributed trust system becomes a prerequisite. Elaborating our previous work on rule-based SLA validation, we propose a hybrid distributed trust model. This new model is based on Public Key Infrastructure (PKI) and reputation-based trust systems. It helps preventing SLA violations by identifying violation-prone services at service selection stage and actively contributes in breach management at the time of penalty enforcement.
Globus | Informatics Technology for Cancer Research (ITCR)
Globus software services provide secure cancer research data transfer, synchronization, and sharing in distributed environments at large scale. These services can be integrated into applications and research data gateways, leveraging Globus identity management, single sign-on, search, and authorization capabilities. Globus Genomics integrates Globus with the Galaxy genomics workflow engine and Amazon Web Services to enable cancer genomics analysis that can elastically scale compute resources with demand.
Lessons from implementing a combined workflow-informatics system for diabetes management.
Zai, Adrian H; Grant, Richard W; Estey, Greg; Lester, William T; Andrews, Carl T; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C
2008-01-01
Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease.
NASA Astrophysics Data System (ADS)
Leibovici, D. G.; Pourabdollah, A.; Jackson, M.
2011-12-01
Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
NASA Astrophysics Data System (ADS)
Kumlander, Deniss
The globalization of companies operations and competitor between software vendors demand improving quality of delivered software and decreasing the overall cost. The same in fact introduce a lot of problem into software development process as produce distributed organization breaking the co-location rule of modern software development methodologies. Here we propose a reformulation of the ambassador position increasing its productivity in order to bridge communication and workflow gap by managing the entire communication process rather than concentrating purely on the communication result.
Workflow computing. Improving management and efficiency of pathology diagnostic services.
Buffone, G J; Moreau, D; Beck, J R
1996-04-01
Traditionally, information technology in health care has helped practitioners to collect, store, and present information and also to add a degree of automation to simple tasks (instrument interfaces supporting result entry, for example). Thus commercially available information systems do little to support the need to model, execute, monitor, coordinate, and revise the various complex clinical processes required to support health-care delivery. Workflow computing, which is already implemented and improving the efficiency of operations in several nonmedical industries, can address the need to manage complex clinical processes. Workflow computing not only provides a means to define and manage the events, roles, and information integral to health-care delivery but also supports the explicit implementation of policy or rules appropriate to the process. This article explains how workflow computing may be applied to health-care and the inherent advantages of the technology, and it defines workflow system requirements for use in health-care delivery with special reference to diagnostic pathology.
Digital Library Storage using iRODS Data Grids
NASA Astrophysics Data System (ADS)
Hedges, Mark; Blanke, Tobias; Hasan, Adil
Digital repository software provides a powerful and flexible infrastructure for managing and delivering complex digital resources and metadata. However, issues can arise in managing the very large, distributed data files that may constitute these resources. This paper describes an implementation approach that combines the Fedora digital repository software with a storage layer implemented as a data grid, using the iRODS middleware developed by DICE (Data Intensive Cyber Environments) as the successor to SRB. This approach allows us to use Fedoras flexible architecture to manage the structure of resources and to provide application- layer services to users. The grid-based storage layer provides efficient support for managing and processing the underlying distributed data objects, which may be very large (e.g. audio-visual material). The Rule Engine built into iRODS is used to integrate complex workflows at the data level that need not be visible to users, e.g. digital preservation functionality.
UBioLab: a web-LABoratory for Ubiquitous in-silico experiments.
Bartocci, E; Di Berardini, M R; Merelli, E; Vito, L
2012-03-01
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists -for what concerns their management and visualization- and for bioinformaticians -for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle -and possibly to handle in a transparent and uniform way- aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features -as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques- give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows
Pugmire, David; Kress, James; Choi, Jong; ...
2016-08-04
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline
Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur
2010-01-01
Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408
wft4galaxy: a workflow testing tool for galaxy.
Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi
2017-12-01
Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
NASA Technical Reports Server (NTRS)
1997-01-01
CENTRA 2000 Inc., a wholly owned subsidiary of Auto-trol technology, obtained permission to use software originally developed at Johnson Space Center for the Space Shuttle and early Space Station projects. To support their enormous information-handling needs, a product data management, electronic document management and work-flow system was designed. Initially, just 33 database tables comprised the original software, which was later expanded to about 100 tables. This system, now called CENTRA 2000, is designed for quick implementation and supports the engineering process from preliminary design through release-to-production. CENTRA 2000 can also handle audit histories and provides a means to ensure new information is distributed. The product has 30 production sites worldwide.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
MyGeoHub: A Collaborative Geospatial Research and Education Platform
NASA Astrophysics Data System (ADS)
Kalyanam, R.; Zhao, L.; Biehl, L. L.; Song, C. X.; Merwade, V.; Villoria, N.
2017-12-01
Scientific research is increasingly collaborative and globally distributed; research groups now rely on web-based scientific tools and data management systems to simplify their day-to-day collaborative workflows. However, such tools often lack seamless interfaces, requiring researchers to contend with manual data transfers, annotation and sharing. MyGeoHub is a web platform that supports out-of-the-box, seamless workflows involving data ingestion, metadata extraction, analysis, sharing and publication. MyGeoHub is built on the HUBzero cyberinfrastructure platform and adds general-purpose software building blocks (GABBs), for geospatial data management, visualization and analysis. A data management building block iData, processes geospatial files, extracting metadata for keyword and map-based search while enabling quick previews. iData is pervasive, allowing access through a web interface, scientific tools on MyGeoHub or even mobile field devices via a data service API. GABBs includes a Python map library as well as map widgets that in a few lines of code, generate complete geospatial visualization web interfaces for scientific tools. GABBs also includes powerful tools that can be used with no programming effort. The GeoBuilder tool provides an intuitive wizard for importing multi-variable, geo-located time series data (typical of sensor readings, GPS trackers) to build visualizations supporting data filtering and plotting. MyGeoHub has been used in tutorials at scientific conferences and educational activities for K-12 students. MyGeoHub is also constantly evolving; the recent addition of Jupyter and R Shiny notebook environments enable reproducible, richly interactive geospatial analyses and applications ranging from simple pre-processing to published tools. MyGeoHub is not a monolithic geospatial science gateway, instead it supports diverse needs ranging from just a feature-rich data management system, to complex scientific tools and workflows.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, J; Wang, J; Peng, J
Purpose: To implement an entire workflow quality assurance (QA) process in the radiotherapy department and to reduce the error rates of radiotherapy based on the entire workflow management in the developing country. Methods: The entire workflow QA process management starts from patient registration to the end of last treatment including all steps through the entire radiotherapy process. Error rate of chartcheck is used to evaluate the the entire workflow QA process. Two to three qualified senior medical physicists checked the documents before the first treatment fraction of every patient. Random check of the treatment history during treatment was also performed.more » A total of around 6000 patients treatment data before and after implementing the entire workflow QA process were compared from May, 2014 to December, 2015. Results: A systemic checklist was established. It mainly includes patient’s registration, treatment plan QA, information exporting to OIS(Oncology Information System), documents of treatment QAand QA of the treatment history. The error rate derived from the chart check decreases from 1.7% to 0.9% after our the entire workflow QA process. All checked errors before the first treatment fraction were corrected as soon as oncologist re-confirmed them and reinforce staff training was accordingly followed to prevent those errors. Conclusion: The entire workflow QA process improved the safety, quality of radiotherapy in our department and we consider that our QA experience can be applicable for the heavily-loaded radiotherapy departments in developing country.« less
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Ergonomic design for dental offices.
Ahearn, David J; Sanders, Martha J; Turcotte, Claudia
2010-01-01
The increasing complexity of the dental office environment influences productivity and workflow for dental clinicians. Advances in technology, and with it the range of products needed to provide services, have led to sprawl in operatory setups and the potential for awkward postures for dental clinicians during the delivery of oral health services. Although ergonomics often addresses the prevention of musculoskeletal disorders for specific populations of workers, concepts of workflow and productivity are integral to improved practice in work environments. This article provides suggestions for improving workflow and productivity for dental clinicians. The article applies ergonomic principles to dental practice issues such as equipment and supply management, office design, and workflow management. Implications for improved ergonomic processes and future research are explored.
Optimization of business processes in banks through flexible workflow
NASA Astrophysics Data System (ADS)
Postolache, V.
2017-08-01
This article describes an integrated business model of a commercial bank. There are examples of components that go into its composition: wooden models and business processes, strategic goals, organizational structure, system architecture, operational and marketing risk models, etc. The practice has shown that the development and implementation of the integrated business model of the bank significantly increase operating efficiency and its management, ensures organizational and technology stable development. Considering the evolution of business processes in the banking sector, should be analysed their common characteristics. From the author’s point of view, a business process is a set of various activities of a commercial bank in which “Input” is one or more financial and material resources, as a result of this activity and “output” is created by banking product, which is some value to consumer. Using workflow technology, management business process efficiency issue is a matter of managing the integration of resources and sequence of actions aimed at achieving this goal. In turn, it implies management of jobs or functions’ interaction, synchronizing of the assignments periods, reducing delays in the transmission of the results etc. Workflow technology is very important for managers at all levels, as they can use it to easily strengthen the control over what is happening in a particular unit, and in the bank as a whole. The manager is able to plan, to implement rules, to interact within the framework of the company’s procedures and tasks entrusted to the system of the distribution function and execution control, alert on the implementation and issuance of the statistical data on the effectiveness of operating procedures. Development and active use of the integrated bank business model is one of the key success factors that contribute to long-term and stable development of the bank, increase employee efficiency and business processes, implement the strategic objectives.
SynTrack: DNA Assembly Workflow Management (SynTrack) v2.0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
MENG, XIANWEI; SIMIRENKO, LISA
2016-12-01
SynTrack is a dynamic, workflow-driven data management system that tracks the DNA build process: Management of the hierarchical relationships of the DNA fragments; Monitoring of process tasks for the assembly of multiple DNA fragments into final constructs; Creations of vendor order forms with selectable building blocks. Organizing plate layouts barcodes for vendor/pcr/fusion/chewback/bioassay/glycerol/master plate maps (default/condensed); Creating or updating Pre-Assembly/Assembly process workflows with selected building blocks; Generating Echo pooling instructions based on plate maps; Tracking of building block orders, received and final assembled for delivering; Bulk updating of colony or PCR amplification information, fusion PCR and chewback results; Updating with QA/QCmore » outcome with .csv & .xlsx template files; Re-work assembly workflow enabled before and after sequencing validation; and Tracking of plate/well data changes and status updates and reporting of master plate status with QC outcomes.« less
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Bouzguenda, Lotfi; Turki, Manel
2014-04-01
This paper shows how the combined use of agent and web services technologies can help to design an architectural style for dynamic medical Cross-Organizational Workflow (COW) management system. Medical COW aims at supporting the collaboration between several autonomous and possibly heterogeneous medical processes, distributed over different organizations (Hospitals, Clinic or laboratories). Dynamic medical COW refers to occasional cooperation between these health organizations, free of structural constraints, where the medical partners involved and their number are not pre-defined. More precisely, this paper proposes a new architecture style based on agents and web services technologies to deal with two key coordination issues of dynamic COW: medical partners finding and negotiation between them. It also proposes how the proposed architecture for dynamic medical COW management system can connect to a multi-agent system coupling the Clinical Decision Support System (CDSS) with Computerized Prescriber Order Entry (CPOE). The idea is to assist the health professionals such as doctors, nurses and pharmacists with decision making tasks, as determining diagnosis or patient data analysis without stopping their clinical processes in order to act in a coherent way and to give care to the patient.
Improving diabetes population management efficiency with an informatics solution.
Zai, Adrian; Grant, Richard; Andrews, Carl; Yee, Ronnie; Chueh, Henry
2007-10-11
Despite intensive resource use for diabetes management in the U.S., our care continues to fall short of evidence-based goals, partly due to system inefficiencies. Diabetes registries are increasingly being utilized as a critical tool for population level disease management by providing real-time data. Since the successful adoption of a diabetes registry depends on how well it integrates with disease management workflows, we optimized our current diabetes management workflow and designed our registry application around it.
Flexible Early Warning Systems with Workflows and Decision Tables
NASA Astrophysics Data System (ADS)
Riedel, F.; Chaves, F.; Zeiner, H.
2012-04-01
An essential part of early warning systems and systems for crisis management are decision support systems that facilitate communication and collaboration. Often official policies specify how different organizations collaborate and what information is communicated to whom. For early warning systems it is crucial that information is exchanged dynamically in a timely manner and all participants get exactly the information they need to fulfil their role in the crisis management process. Information technology obviously lends itself to automate parts of the process. We have experienced however that in current operational systems the information logistics processes are hard-coded, even though they are subject to change. In addition, systems are tailored to the policies and requirements of a certain organization and changes can require major software refactoring. We seek to develop a system that can be deployed and adapted to multiple organizations with different dynamic runtime policies. A major requirement for such a system is that changes can be applied locally without affecting larger parts of the system. In addition to the flexibility regarding changes in policies and processes, the system needs to be able to evolve; when new information sources become available, it should be possible to integrate and use these in the decision process. In general, this kind of flexibility comes with a significant increase in complexity. This implies that only IT professionals can maintain a system that can be reconfigured and adapted; end-users are unable to utilise the provided flexibility. In the business world similar problems arise and previous work suggested using business process management systems (BPMS) or workflow management systems (WfMS) to guide and automate early warning processes or crisis management plans. However, the usability and flexibility of current WfMS are limited, because current notations and user interfaces are still not suitable for end-users, and workflows are usually only suited for rigid processes. We show how improvements can be achieved by using decision tables and rule-based adaptive workflows. Decision tables have been shown to be an intuitive tool that can be used by domain experts to express rule sets that can be interpreted automatically at runtime. Adaptive workflows use a rule-based approach to increase the flexibility of workflows by providing mechanisms to adapt workflows based on context changes, human intervention and availability of services. The combination of workflows, decision tables and rule-based adaption creates a framework that opens up new possibilities for flexible and adaptable workflows, especially, for use in early warning and crisis management systems.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2017-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
A Two-Stage Probabilistic Approach to Manage Personal Worklist in Workflow Management Systems
NASA Astrophysics Data System (ADS)
Han, Rui; Liu, Yingbo; Wen, Lijie; Wang, Jianmin
The application of workflow scheduling in managing individual actor's personal worklist is one area that can bring great improvement to business process. However, current deterministic work cannot adapt to the dynamics and uncertainties in the management of personal worklist. For such an issue, this paper proposes a two-stage probabilistic approach which aims at assisting actors to flexibly manage their personal worklists. To be specific, the approach analyzes every activity instance's continuous probability of satisfying deadline at the first stage. Based on this stochastic analysis result, at the second stage, an innovative scheduling strategy is proposed to minimize the overall deadline violation cost for an actor's personal worklist. Simultaneously, the strategy recommends the actor a feasible worklist of activity instances which meet the required bottom line of successful execution. The effectiveness of our approach is evaluated in a real-world workflow management system and with large scale simulation experiments.
Contextual cloud-based service oriented architecture for clinical workflow.
Moreno-Conde, Jesús; Moreno-Conde, Alberto; Núñez-Benjumea, Francisco J; Parra-Calderón, Carlos
2015-01-01
Given that acceptance of systems within the healthcare domain multiple papers highlighted the importance of integrating tools with the clinical workflow. This paper analyse how clinical context management could be deployed in order to promote the adoption of cloud advanced services and within the clinical workflow. This deployment will be able to be integrated with the eHealth European Interoperability Framework promoted specifications. Throughout this paper, it is proposed a cloud-based service-oriented architecture. This architecture will implement a context management system aligned with the HL7 standard known as CCOW.
Overcoming Barriers to Technology Adoption in Small Manufacturing Enterprises (SMEs)
2003-06-01
automates quote-generation, order - processing workflow management, perform- ance analysis, and accounting functions. Ultimately, it will enable Magdic...that Magdic imple- ment an MES instead. The MES, in addition to solving the problem of document manage- ment, would automate quote-generation, order ... processing , workflow management, perform- ance analysis, and accounting functions. To help Magdic personnel learn about the MES, TIDE personnel provided
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-10-26
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m / z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation.
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-01-01
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m/z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation. PMID:28248242
Architecture of next-generation information management systems for digital radiology enterprises
NASA Astrophysics Data System (ADS)
Wong, Stephen T. C.; Wang, Huili; Shen, Weimin; Schmidt, Joachim; Chen, George; Dolan, Tom
2000-05-01
Few information systems today offer a clear and flexible means to define and manage the automated part of radiology processes. None of them provide a coherent and scalable architecture that can easily cope with heterogeneity and inevitable local adaptation of applications. Most importantly, they often lack a model that can integrate clinical and administrative information to aid better decisions in managing resources, optimizing operations, and improving productivity. Digital radiology enterprises require cost-effective solutions to deliver information to the right person in the right place and at the right time. We propose a new architecture of image information management systems for digital radiology enterprises. Such a system is based on the emerging technologies in workflow management, distributed object computing, and Java and Web techniques, as well as Philips' domain knowledge in radiology operations. Our design adapts the approach of '4+1' architectural view. In this new architecture, PACS and RIS will become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, it will provide powerful query and statistical functions for managing resources and improving productivity in real time. This work will lead to a new direction of image information management in the next millennium. We will illustrate the innovative design with implemented examples of a working prototype.
How to Take HRMS Process Management to the Next Level with Workflow Business Event System
NASA Technical Reports Server (NTRS)
Rajeshuni, Sarala; Yagubian, Aram; Kunamaneni, Krishna
2006-01-01
Oracle Workflow with the Business Event System offers a complete process management solution for enterprises to manage business processes cost-effectively. Using Workflow event messaging, event subscriptions, AQ Servlet and advanced queuing technologies, this presentation will demonstrate the step-by-step design and implementation of system solutions in order to integrate two dissimilar systems and establish communication remotely. As a case study, the presentation walks you through the process of propagating organization name changes in other applications that originated from the HRMS module without changing applications code. The solution can be applied to your particular business cases for streamlining or modifying business processes across Oracle and non-Oracle applications.
Video fingerprinting for copy identification: from research to industry applications
NASA Astrophysics Data System (ADS)
Lu, Jian
2009-02-01
Research that began a decade ago in video copy detection has developed into a technology known as "video fingerprinting". Today, video fingerprinting is an essential and enabling tool adopted by the industry for video content identification and management in online video distribution. This paper provides a comprehensive review of video fingerprinting technology and its applications in identifying, tracking, and managing copyrighted content on the Internet. The review includes a survey on video fingerprinting algorithms and some fundamental design considerations, such as robustness, discriminability, and compactness. It also discusses fingerprint matching algorithms, including complexity analysis, and approximation and optimization for fast fingerprint matching. On the application side, it provides an overview of a number of industry-driven applications that rely on video fingerprinting. Examples are given based on real-world systems and workflows to demonstrate applications in detecting and managing copyrighted content, and in monitoring and tracking video distribution on the Internet.
ERIC Educational Resources Information Center
Fuentes, Steven
2017-01-01
Usability heuristics have been established for different uses and applications as general guidelines for user interfaces. These can affect the implementation of industry solutions and play a significant role regarding cost reduction and process efficiency. The area of electronic workflow document management (EWDM) solutions, also known as…
Towards an intelligent hospital environment: OR of the future.
Sutherland, Jeffrey V; van den Heuvel, Willem-Jan; Ganous, Tim; Burton, Matthew M; Kumar, Animesh
2005-01-01
Patients, providers, payers, and government demand more effective and efficient healthcare services, and the healthcare industry needs innovative ways to re-invent core processes. Business process reengineering (BPR) showed adopting new hospital information systems can leverage this transformation and workflow management technologies can automate process management. Our research indicates workflow technologies in healthcare require real time patient monitoring, detection of adverse events, and adaptive responses to breakdown in normal processes. Adaptive workflow systems are rarely implemented making current workflow implementations inappropriate for healthcare. The advent of evidence based medicine, guideline based practice, and better understanding of cognitive workflow combined with novel technologies including Radio Frequency Identification (RFID), mobile/wireless technologies, internet workflow, intelligent agents, and Service Oriented Architectures (SOA) opens up new and exciting ways of automating business processes. Total situational awareness of events, timing, and location of healthcare activities can generate self-organizing change in behaviors of humans and machines. A test bed of a novel approach towards continuous process management was designed for the new Weinburg Surgery Building at the University of Maryland Medical. Early results based on clinical process mapping and analysis of patient flow bottlenecks demonstrated 100% improvement in delivery of supplies and instruments at surgery start time. This work has been directly applied to the design of the DARPA Trauma Pod research program where robotic surgery will be performed on wounded soldiers on the battlefield.
UBioLab: a web-laboratory for ubiquitous in-silico experiments.
Bartocci, Ezio; Cacciagrano, Diletta; Di Berardini, Maria Rita; Merelli, Emanuela; Vito, Leonardo
2012-07-09
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists –for what concerns their management and visualization– and for bioinformaticians –for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle –and possibly to handle in a transparent and uniform way– aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features –as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques– give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
NASA Astrophysics Data System (ADS)
Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.
2015-05-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M
2016-01-01
Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.
An application of digital network technology to medical image management.
Chu, W K; Smith, C L; Wobig, R K; Hahn, F A
1997-01-01
With the advent of network technology, there is considerable interest within the medical community to manage the storage and distribution of medical images by digital means. Higher workflow efficiency leading to better patient care is one of the commonly cited outcomes [1,2]. However, due to the size of medical image files and the unique requirements in detail and resolution, medical image management poses special challenges. Storage requirements are usually large, which implies expenses or investment costs make digital networking projects financially out of reach for many clinical institutions. New advances in network technology and telecommunication, in conjunction with the decreasing cost in computer devices, have made digital image management achievable. In our institution, we have recently completed a pilot project to distribute medical images both within the physical confines of the clinical enterprise as well as outside the medical center campus. The design concept and the configuration of a comprehensive digital image network is described in this report.
Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Zamanyan, Alen; Torri, Federica; Macciardi, Fabio; Hobel, Sam; Moon, Seok Woo; Sung, Young Hee; Jiang, Zhiguo; Labus, Jennifer; Kurth, Florian; Ashe-McNalley, Cody; Mayer, Emeran; Vespa, Paul M.; Van Horn, John D.; Toga, Arthur W.
2013-01-01
The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data. PMID:23975276
ERIC Educational Resources Information Center
An, Ho
2012-01-01
In this dissertation, two interrelated problems of service-based systems (SBS) are addressed: protecting users' data confidentiality from service providers, and managing performance of multiple workflows in SBS. Current SBSs pose serious limitations to protecting users' data confidentiality. Since users' sensitive data is sent in…
Wiggers, Anne-Marieke; Vosbergen, Sandra; Kraaijenhagen, Roderik; Jaspers, Monique; Peek, Niels
2013-01-01
E-health interventions are of a growing importance for self-management of chronic conditions. This study aimed to describe the process adaptions that are needed in cardiac rehabilitation (CR) to implement a self-management system, called MyCARDSS. We created a generic workflow model based on interviews and observations at three CR clinics. Subsequently, a workflow model of the ideal situation after implementation of MyCARDSS was created. We found that the implementation will increase the complexity of existing working procedures because 1) not all patients will use MyCARDSS, 2) there is a transfer of tasks and responsibilities from professionals to patients, and 3) information in MyCARDSS needs to be synchronized with the EPR system for professionals.
Integrating Remote and Social Sensing Data for a Scenario on Secure Societies in Big Data Platform
NASA Astrophysics Data System (ADS)
Albani, Sergio; Lazzarini, Michele; Koubarakis, Manolis; Taniskidou, Efi Karra; Papadakis, George; Karkaletsis, Vangelis; Giannakopoulos, George
2016-08-01
In the framework of the Horizon 2020 project BigDataEurope (Integrating Big Data, Software & Communities for Addressing Europe's Societal Challenges), a pilot for the Secure Societies Societal Challenge was designed considering the requirements coming from relevant stakeholders. The pilot is focusing on the integration in a Big Data platform of data coming from remote and social sensing.The information on land changes coming from the Copernicus Sentinel 1A sensor (Change Detection workflow) is integrated with information coming from selected Twitter and news agencies accounts (Event Detection workflow) in order to provide the user with multiple sources of information.The Change Detection workflow implements a processing chain in a distributed parallel manner, exploiting the Big Data capabilities in place; the Event Detection workflow implements parallel and distributed social media and news agencies monitoring as well as suitable mechanisms to detect and geo-annotate the related events.
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.
2014-12-01
The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...
2016-07-21
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
FluxCTTX: A LIMS-based tool for management and analysis of cytotoxicity assays data
2015-01-01
Background Cytotoxicity assays have been used by researchers to screen for cytotoxicity in compound libraries. Researchers can either look for cytotoxic compounds or screen "hits" from initial high-throughput drug screens for unwanted cytotoxic effects before investing in their development as a pharmaceutical. These assays may be used as an alternative to animal experimentation and are becoming increasingly important in modern laboratories. However, the execution of these assays in large scale and different laboratories requires, among other things, the management of protocols, reagents, cell lines used as well as the data produced, which can be a challenge. The management of all this information is greatly improved by the utilization of computational tools to save time and guarantee quality. However, a tool that performs this task designed specifically for cytotoxicity assays is not yet available. Results In this work, we have used a workflow based LIMS -- the Flux system -- and the Together Workflow Editor as a framework to develop FluxCTTX, a tool for management of data from cytotoxicity assays performed at different laboratories. The main work is the development of a workflow, which represents all stages of the assay and has been developed and uploaded in Flux. This workflow models the activities of cytotoxicity assays performed as described in the OECD 129 Guidance Document. Conclusions FluxCTTX presents a solution for the management of the data produced by cytotoxicity assays performed at Interlaboratory comparisons. Its adoption will contribute to guarantee the quality of activities in the process of cytotoxicity tests and enforce the use of Good Laboratory Practices (GLP). Furthermore, the workflow developed is complete and can be adapted to other contexts and different tests for management of other types of data. PMID:26696462
SISYPHUS: A high performance seismic inversion factory
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas
2016-04-01
In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
Texas Solar Collaboration Action Plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Winland, Chris
2013-02-14
Texas Solar Collaboration Permitting and Interconenction Process Improvement Action Plan. San Antonio-specific; Investigate feasibility of using electronic signatures; Investigate feasibility of enabling other online permitting processes (e.g., commercial); Assess need for future document management and workflow/notification IT improvements; Update Information Bulletin 153 regarding City requirements and processes for PV; Educate contractors and public on CPS Energy’s new 2013 solar program processes; Continue to discuss “downtown grid” interconnection issues and identify potential solutions; Consider renaming Distributed Energy Resources (DER); and Continue to participate in collaborative actions.
NASA Astrophysics Data System (ADS)
Fiore, Sandro; Williams, Dean; Aloisio, Giovanni
2016-04-01
In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated (e.g., the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). Most of the tools currently available for scientific data analysis in the climate domain fail at large scale since they: (1) are desktop based and need the data locally; (2) are sequential, so do not benefit from available multicore/parallel machines; (3) do not provide declarative languages to express scientific data analysis tasks; (4) are domain-specific, which ties their adoption to a specific domain; and (5) do not provide a workflow support, to enable the definition of complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes ("datacubes"). The project relies on a strong background of high performance database management and OLAP systems to manage large scientific data sets. It also provides a native workflow management support, to define processing chains and workflows with tens to hundreds of data analytics operators to build real scientific use cases. With regard to interoperability aspects, the talk will present the contribution provided both to the RDA Working Group on Array Databases, and the Earth System Grid Federation (ESGF) Compute Working Team. Also highlighted will be the results of large scale climate model intercomparison data analysis experiments, for example: (1) defined in the context of the EU H2020 INDIGO-DataCloud project; (2) implemented in a real geographically distributed environment involving CMCC (Italy) and LLNL (US) sites; (3) exploiting Ophidia as server-side, parallel analytics engine; and (4) applied on real CMIP5 data sets available through ESGF.
Fisher, Arielle M; Herbert, Mary I; Douglas, Gerald P
2016-02-19
The Birmingham Free Clinic (BFC) in Pittsburgh, Pennsylvania, USA is a free, walk-in clinic that serves medically uninsured populations through the use of volunteer health care providers and an on-site medication dispensary. The introduction of an electronic medical record (EMR) has improved several aspects of clinic workflow. However, pharmacists' tasks involving medication management and dispensing have become more challenging since EMR implementation due to its inability to support workflows between the medical and pharmaceutical services. To inform the design of a systematic intervention, we conducted a needs assessment study to identify workflow challenges and process inefficiencies in the dispensary. We used contextual inquiry to document the dispensary workflow and facilitate identification of critical aspects of intervention design specific to the user. Pharmacists were observed according to contextual inquiry guidelines. Graphical models were produced to aid data and process visualization. We created a list of themes describing workflow challenges and asked the pharmacists to rank them in order of significance to narrow the scope of intervention design. Three pharmacists were observed at the BFC. Observer notes were documented and analyzed to produce 13 themes outlining the primary challenges pharmacists encounter during dispensation at the BFC. The dispensary workflow is labor intensive, redundant, and inefficient when integrated with the clinical service. Observations identified inefficiencies that may benefit from the introduction of informatics interventions including: medication labeling, insufficient process notification, triple documentation, and inventory control. We propose a system for Prescription Management and General Inventory Control (RxMAGIC). RxMAGIC is a framework designed to mitigate workflow challenges and improve the processes of medication management and inventory control. While RxMAGIC is described in the context of the BFC dispensary, we believe it will be generalizable to pharmacies in other low-resource settings, both domestically and internationally.
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.
gProcess and ESIP Platforms for Satellite Imagery Processing over the Grid
NASA Astrophysics Data System (ADS)
Bacu, Victor; Gorgan, Dorian; Rodila, Denisa; Pop, Florin; Neagu, Gabriel; Petcu, Dana
2010-05-01
The Environment oriented Satellite Data Processing Platform (ESIP) is developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) co-funded by the European Commission through FP7 [1]. The gProcess Platform [2] is a set of tools and services supporting the development and the execution over the Grid of the workflow based processing, and particularly the satelite imagery processing. The ESIP [3], [4] is build on top of the gProcess platform by adding a set of satellite image processing software modules and meteorological algorithms. The satellite images can reveal and supply important information on earth surface parameters, climate data, pollution level, weather conditions that can be used in different research areas. Generally, the processing algorithms of the satellite images can be decomposed in a set of modules that forms a graph representation of the processing workflow. Two types of workflows can be defined in the gProcess platform: abstract workflow (PDG - Process Description Graph), in which the user defines conceptually the algorithm, and instantiated workflow (iPDG - instantiated PDG), which is the mapping of the PDG pattern on particular satellite image and meteorological data [5]. The gProcess platform allows the definition of complex workflows by combining data resources, operators, services and sub-graphs. The gProcess platform is developed for the gLite middleware that is available in EGEE and SEE-GRID infrastructures [6]. gProcess exposes the specific functionality through web services [7]. The Editor Web Service retrieves information on available resources that are used to develop complex workflows (available operators, sub-graphs, services, supported resources, etc.). The Manager Web Service deals with resources management (uploading new resources such as workflows, operators, services, data, etc.) and in addition retrieves information on workflows. The Executor Web Service manages the execution of the instantiated workflows on the Grid infrastructure. In addition, this web service monitors the execution and generates statistical data that are important to evaluate performances and to optimize execution. The Viewer Web Service allows access to input and output data. To prove and to validate the utility of the gProcess and ESIP platforms there were developed the GreenView and GreenLand applications. The GreenView related functionality includes the refinement of some meteorological data such as temperature, and the calibration of the satellite images based on field measurements. The GreenLand application performs the classification of the satellite images by using a set of vegetation indices. The gProcess and ESIP platforms are used as well in GiSHEO project [8] to support the processing of Earth Observation data over the Grid in eGLE (GiSHEO eLearning Environment). Experiments of performance assessment were conducted and they have revealed that the workflow-based execution could improve the execution time of a satellite image processing algorithm [9]. It is not a reliable solution to execute all the workflow nodes on different machines. The execution of some nodes can be more time consuming and they will be performed in a longer time than other nodes. The total execution time will be affected because some nodes will slow down the execution. It is important to correctly balance the workflow nodes. Based on some optimization strategy the workflow nodes can be grouped horizontally, vertically or in a hybrid approach. In this way, those operators will be executed on one machine and also the data transfer between workflow nodes will be lower. The dynamic nature of the Grid infrastructure makes it more exposed to the occurrence of failures. These failures can occur at worker node, services availability, storage element, etc. Currently gProcess has support for some basic error prevention and error management solutions. In future, some more advanced error prevention and management solutions will be integrated in the gProcess platform. References [1] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [2] Bacu V., Stefanut T., Rodila D., Gorgan D., Process Description Graph Composition by gProcess Platform. HiPerGRID - 3rd International Workshop on High Performance Grid Middleware, 28 May, Bucharest. Proceedings of CSCS-17 Conference, Vol.2., ISSN 2066-4451, pp. 423-430, (2009). [3] ESIP Platform, http://wiki.egee-see.org/index.php/JRA1_Commonalities [4] Gorgan D., Bacu V., Rodila D., Pop Fl., Petcu D., Experiments on ESIP - Environment oriented Satellite Data Processing Platform. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 157-166 (2009). [5] Radu, A., Bacu, V., Gorgan, D., Diagrammatic Description of Satellite Image Processing Workflow. Workshop on Grid Computing Applications Development (GridCAD) at the SYNASC Symposium, 28 September 2007, Timisoara, IEEE Computer Press, ISBN 0-7695-3078-8, 2007, pp. 341-348 (2007). [6] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [7] Rodila D., Bacu V., Gorgan D., Integration of Satellite Image Operators as Workflows in the gProcess Application. Proceedings of ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27-29 Aug, 2009 Cluj-Napoca. ISBN: 978-1-4244-5007-7, pp. 355-358 (2009). [8] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [9] Bacu V., Gorgan D., Graph Based Evaluation of Satellite Imagery Processing over Grid. ISPDC 2008 - 7th International Symposium on Parallel and Distributed Computing, July 1-5, 2008, Krakow, Poland. IEEE Computer Society 2008, ISBN: 978-0-7695-3472-5, pp. 147-154.
Managing and Communicating Operational Workflow
Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.
2016-01-01
Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407
Musinguzi, Henry; Lwanga, Newton; Kezimbira, Dafala; Kigozi, Edgar; Katabazi, Fred Ashaba; Wayengera, Misaki; Joloba, Moses Lutaakome; Abayomi, Emmanuel Akin; Swanepoel, Carmen; Croxton, Talishiea; Ozumba, Petronilla; Thankgod, Anazodo; van Zyl, Lizelle; Mayne, Elizabeth Sarah; Kader, Mukthar; Swartz, Garth
2017-01-01
Biorepositories in Africa need significant infrastructural support to meet International Society for Biological and Environmental Repositories (ISBER) Best Practices to support population-based genomics research. ISBER recommends a biorepository information management system which can manage workflows from biospecimen receipt to distribution. The H3Africa Initiative set out to develop regional African biorepositories where Uganda, Nigeria, and South Africa were successfully awarded grants to develop the state-of-the-art biorepositories. The biorepositories carried out an elaborate process to evaluate and choose a laboratory information management system (LIMS) with the aim of integrating the three geographically distinct sites. In this article, we review the processes, African experience, lessons learned, and make recommendations for choosing a biorepository LIMS in the African context.
The future of scientific workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
A novel approach to optimize workflow in grid-based teleradiology applications.
Yılmaz, Ayhan Ozan; Baykal, Nazife
2016-01-01
This study proposes an infrastructure with a reporting workflow optimization algorithm (RWOA) in order to interconnect facilities, reporting units and radiologists on a single access interface, to increase the efficiency of the reporting process by decreasing the medical report turnaround time and to increase the quality of medical reports by determining the optimum match between the inspection and radiologist in terms of subspecialty, workload and response time. Workflow centric network architecture with an enhanced caching, querying and retrieving mechanism is implemented by seamlessly integrating Grid Agent and Grid Manager to conventional digital radiology systems. The inspection and radiologist attributes are modelled using a hierarchical ontology structure. Attribute preferences rated by radiologists and technical experts are formed into reciprocal matrixes and weights for entities are calculated utilizing Analytic Hierarchy Process (AHP). The assignment alternatives are processed by relation-based semantic matching (RBSM) and Integer Linear Programming (ILP). The results are evaluated based on both real case applications and simulated process data in terms of subspecialty, response time and workload success rates. Results obtained using simulated data are compared with the outcomes obtained by applying Round Robin, Shortest Queue and Random distribution policies. The proposed algorithm is also applied to a real case teleradiology application process data where medical reporting workflow was performed based on manual assignments by the chief radiologist for 6225 inspections. RBSM gives the highest subspecialty success rate and integrating ILP with RBSM ratings as RWOA provides a better response time and workload distribution success rate. RWOA based image delivery also prevents bandwidth, storage or hardware related stuck and latencies. When compared with a real case teleradiology application where inspection assignments were performed manually, the proposed solution was found to increase the experience success rate by 13.25%, workload success rate by 63.76% and response time success rate by 120%. The total response time in the real case application data was improved by 22.39%. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
A GIS-based Model for Natural Gas Data Conversion
NASA Astrophysics Data System (ADS)
Bitik, E.; Seker, D. Z.; Denli, H. H.
2014-12-01
In Turkey gas utility sector has undergone major changes in terms of increased competition between gas providers, efforts in improving services, and applying new technological solutions. This paper discusses the challenges met by gas companies to switch from long workflows of gas distribution, sales and maintenance into IT driven efficient management of complex information both spatially and non-spatially. The aim of this study is migration of all gas data and information into a GIS environment in order to manage and operate all infrastructure investments with a Utility Management System. All data conversion model for migration was designed and tested during the study. A flowchart is formed to transfer the old data layers to the new structure based on geodatabase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Deborah A.; Faybishenko, Boris; Freedman, Vicky L.
Science data gateways are effective in providing complex science data collections to the world-wide user communities. In this paper we describe a gateway for the Advanced Simulation Capability for Environmental Management (ASCEM) framework. Built on top of established web service technologies, the ASCEM data gateway is specifically designed for environmental modeling applications. Its key distinguishing features include: (1) handling of complex spatiotemporal data, (2) offering a variety of selective data access mechanisms, (3) providing state of the art plotting and visualization of spatiotemporal data records, and (4) integrating seamlessly with a distributed workflow system using a RESTful interface. ASCEM projectmore » scientists have been using this data gateway since 2011.« less
Schedule-Aware Workflow Management Systems
NASA Astrophysics Data System (ADS)
Mans, Ronny S.; Russell, Nick C.; van der Aalst, Wil M. P.; Moleman, Arnold J.; Bakker, Piet J. M.
Contemporary workflow management systems offer work-items to users through specific work-lists. Users select the work-items they will perform without having a specific schedule in mind. However, in many environments work needs to be scheduled and performed at particular times. For example, in hospitals many work-items are linked to appointments, e.g., a doctor cannot perform surgery without reserving an operating theater and making sure that the patient is present. One of the problems when applying workflow technology in such domains is the lack of calendar-based scheduling support. In this paper, we present an approach that supports the seamless integration of unscheduled (flow) and scheduled (schedule) tasks. Using CPN Tools we have developed a specification and simulation model for schedule-aware workflow management systems. Based on this a system has been realized that uses YAWL, Microsoft Exchange Server 2007, Outlook, and a dedicated scheduling service. The approach is illustrated using a real-life case study at the AMC hospital in the Netherlands. In addition, we elaborate on the experiences obtained when developing and implementing a system of this scale using formal techniques.
Automated lattice data generation
NASA Astrophysics Data System (ADS)
Ayyar, Venkitesh; Hackett, Daniel C.; Jay, William I.; Neil, Ethan T.
2018-03-01
The process of generating ensembles of gauge configurations (and measuring various observables over them) can be tedious and error-prone when done "by hand". In practice, most of this procedure can be automated with the use of a workflow manager. We discuss how this automation can be accomplished using Taxi, a minimal Python-based workflow manager built for generating lattice data. We present a case study demonstrating this technology.
qPortal: A platform for data-driven biomedical research.
Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven
2018-01-01
Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.
2013-12-01
The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
2011-01-01
Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.
Cieślik, Marcin; Mura, Cameron
2011-02-25
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Schlesinger, Joseph J; Burdick, Kendall; Baum, Sarah; Bellomy, Melissa; Mueller, Dorothee; MacDonald, Alistair; Chern, Alex; Chrouser, Kristin; Burger, Christie
2018-03-01
The concept of clinical workflow borrows from management and leadership principles outside of medicine. The only way to rethink clinical workflow is to understand the neuroscience principles that underlie attention and vigilance. With any implementation to improve practice, there are human factors that can promote or impede progress. Modulating the environment and working as a team to take care of patients is paramount. Clinicians must continually rethink clinical workflow, evaluate progress, and understand that other industries have something to offer. Then, novel approaches can be implemented to take the best care of patients. Copyright © 2017 Elsevier Inc. All rights reserved.
Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics
Astakhov, V.; Bandrowski, A.; Gupta, A.; Kulungowski, A.W.; Grethe, J.S.; Bouwer, J.; Molina, T.; Rowley, V.; Penticoff, S.; Terada, M.; Wong, W.; Hakozaki, H.; Kwon, O.; Martone, M.E.; Ellisman, M.
2016-01-01
We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions. PMID:28479932
MyOcean Central Information System - Achievements and Perspectives
NASA Astrophysics Data System (ADS)
Claverie, Vincent; Loubrieu, Thomas; Jolibois, Tony; de Dianous, Rémi; Blower, Jon; Romero, Laia; Griffiths, Guy
2013-04-01
Since 2009, MyOcean (http://www.myocean.eu) is providing an operational service, for forecasts, analysis and expertise on ocean currents, temperature, salinity, sea level, primary ecosystems and ice coverage. The production of observation and forecasting data is done by 42 Production Units (PU). Product download and visualisation are hosted by 25 Dissemination Units (DU). All these products and associated services are gathered in a single catalogue hiding the intricate distributed organization of PUs and DUs. Besides applying INSPIRE directive and OGC recommendations, MyOcean overcomes technical choices and challenges. This presentation focuses on 3 specific issues met by MyOcean and relevant for many Spatial Data Infrastructures: user's transaction accounting, large volume download and stream line the catalogue maintenance. Transaction Accounting: Set up powerful means to get detailed knowledge of system usage in order to subsequently improve the products (ocean observations, analysis and forecast dataset) and services (view, download) offer. This subject drives the following ones: Central authentication management for the distributed web services implementations: add-on to THREDDS Data Server for WMS and NETCDF sub-setting service, specific FTP. Share user management with co-funding projects. In addition to MyOcean, alternate projects also need consolidated information about the use of the cofunded products. Provide a central facility for the user management. This central facility provides users' rights to geographically distributed services and gathers transaction accounting history from these distributed services. Propose a user-friendly web interface to download large volume of data (several GigaBytes) as robust as basic FTP but intuitive and file/directory independent. This should rely on a web service drafting the INSPIRE to-be specification and OGC recommendations for download taking into account that FTP server is not enough friendly (need to know filenames, directories) and Web-page not allowing downloading several files. Streamline the maintenance of the central catalogue. The major update for MyOcean v3 (April 2013) is the usage of Geonetwork for catalogue management. This improves the system at different levels : The editing interface is more user-friendly and the catalogue updates are managed in a workflow. This workflow allows higher flexibility for minor updates without giving up the high level qualification requirements for the catalogue content. The distributed web services (download, view) are automatically harvested from the THREDDS Data Server. Thus the manual editing on the catalogue is reduced, the associated typos are avoided and the quality of information is finally improved.
Workflow Management for Complex HEP Analyses
NASA Astrophysics Data System (ADS)
Erdmann, M.; Fischer, R.; Rieger, M.; von Cube, R. F.
2017-10-01
We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems, e.g. Apache’s Airavata[1]. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis. Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data collection. Each call to the executable for one of those input files can be submitted to the desired run location, which could be the local computer or a remote batch system. An integrated software manager enables automated user installation of dependencies in the working directory at the run location. Each execution of a step item creates one report for bookkeeping purposes containing error codes and output data or file references. Required files, e.g. created by previous steps, are retrieved automatically. Since data storage and run locations are exchangeable from the steps perspective, computing resources can be used opportunistically. A visualization of the workflow as a graph of the steps in the web browser provides a high-level view on the analysis. The workflow system is developed and tested alongside of a ttbb cross section measurement where, for instance, the event selection is represented by one step and a Bayesian statistical inference is performed by another. The clear interface and dependencies between steps enables a make-like execution of the whole analysis.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology
NASA Astrophysics Data System (ADS)
Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm
2015-04-01
Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
A Workflow-based Intelligent Network Data Movement Advisor with End-to-end Performance Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Michelle M.; Wu, Chase Q.
2013-11-07
Next-generation eScience applications often generate large amounts of simulation, experimental, or observational data that must be shared and managed by collaborative organizations. Advanced networking technologies and services have been rapidly developed and deployed to facilitate such massive data transfer. However, these technologies and services have not been fully utilized mainly because their use typically requires significant domain knowledge and in many cases application users are even not aware of their existence. By leveraging the functionalities of an existing Network-Aware Data Movement Advisor (NADMA) utility, we propose a new Workflow-based Intelligent Network Data Movement Advisor (WINDMA) with end-to-end performance optimization formore » this DOE funded project. This WINDMA system integrates three major components: resource discovery, data movement, and status monitoring, and supports the sharing of common data movement workflows through account and database management. This system provides a web interface and interacts with existing data/space management and discovery services such as Storage Resource Management, transport methods such as GridFTP and GlobusOnline, and network resource provisioning brokers such as ION and OSCARS. We demonstrate the efficacy of the proposed transport-support workflow system in several use cases based on its implementation and deployment in DOE wide-area networks.« less
From chart tracking to workflow management.
Srinivasan, P.; Vignes, G.; Venable, C.; Hazelwood, A.; Cade, T.
1994-01-01
The current interest in system-wide integration appears to be based on the assumption that an organization, by digitizing information and accepting a common standard for the exchange of such information, will improve the accessibility of this information and automatically experience benefits resulting from its more productive use. We do not dispute this reasoning, but assert that an organization's capacity for effective change is proportional to the understanding of the current structure among its personnel. Our workflow manager is based on the use of a Parameterized Petri Net (PPN) model which can be configured to represent an arbitrarily detailed picture of an organization. The PPN model can be animated to observe the model organization in action, and the results of the animation analyzed. This simulation is a dynamic ongoing process which changes with the system and allows members of the organization to pose "what if" questions as a means of exploring opportunities for change. We present, the "workflow management system" as the natural successor to the tracking program, incorporating modeling, scheduling, reactive planning, performance evaluation, and simulation. This workflow management system is more than adequate for meeting the needs of a paper chart tracking system, and, as the patient record is computerized, will serve as a planning and evaluation tool in converting the paper-based health information system into a computer-based system. PMID:7950051
Cotes-Ruiz, Iván Tomás; Prado, Rocío P.; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique. PMID:28085932
Cotes-Ruiz, Iván Tomás; Prado, Rocío P; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique.
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization
Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...
2015-01-01
This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less
Digitization workflows for flat sheets and packets of plants, algae, and fungi1
Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.
2015-01-01
Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256
The QuakeSim Project: Web Services for Managing Geophysical Data and Applications
NASA Astrophysics Data System (ADS)
Pierce, Marlon E.; Fox, Geoffrey C.; Aktas, Mehmet S.; Aydin, Galip; Gadgil, Harshawardhan; Qi, Zhigang; Sayar, Ahmet
2008-04-01
We describe our distributed systems research efforts to build the “cyberinfrastructure” components that constitute a geophysical Grid, or more accurately, a Grid of Grids. Service-oriented computing principles are used to build a distributed infrastructure of Web accessible components for accessing data and scientific applications. Our data services fall into two major categories: Archival, database-backed services based around Geographical Information System (GIS) standards from the Open Geospatial Consortium, and streaming services that can be used to filter and route real-time data sources such as Global Positioning System data streams. Execution support services include application execution management services and services for transferring remote files. These data and execution service families are bound together through metadata information and workflow services for service orchestration. Users may access the system through the QuakeSim scientific Web portal, which is built using a portlet component approach.
Real-Time System for Water Modeling and Management
NASA Astrophysics Data System (ADS)
Lee, J.; Zhao, T.; David, C. H.; Minsker, B.
2012-12-01
Working closely with the Texas Commission on Environmental Quality (TCEQ) and the University of Texas at Austin (UT-Austin), we are developing a real-time system for water modeling and management using advanced cyberinfrastructure, data integration and geospatial visualization, and numerical modeling. The state of Texas suffered a severe drought in 2011 that cost the state $7.62 billion in agricultural losses (crops and livestock). Devastating situations such as this could potentially be avoided with better water modeling and management strategies that incorporate state of the art simulation and digital data integration. The goal of the project is to prototype a near-real-time decision support system for river modeling and management in Texas that can serve as a national and international model to promote more sustainable and resilient water systems. The system uses National Weather Service current and predicted precipitation data as input to the Noah-MP Land Surface model, which forecasts runoff, soil moisture, evapotranspiration, and water table levels given land surface features. These results are then used by a river model called RAPID, along with an error model currently under development at UT-Austin, to forecast stream flows in the rivers. Model forecasts are visualized as a Web application for TCEQ decision makers, who issue water diversion (withdrawal) permits and any needed drought restrictions; permit holders; and reservoir operation managers. Users will be able to adjust model parameters to predict the impacts of alternative curtailment scenarios or weather forecasts. A real-time optimization system under development will help TCEQ to identify optimal curtailment strategies to minimize impacts on permit holders and protect health and safety. To develop the system we have implemented RAPID as a remotely-executed modeling service using the Cyberintegrator workflow system with input data downloaded from the North American Land Data Assimilation System. The Cyberintegrator workflow system provides RESTful web services for users to provide inputs, execute workflows, and retrieve outputs. Along with REST endpoints, PAW (Publishable Active Workflows) provides the web user interface toolkit for us to develop web applications with scientific workflows. The prototype web application is built on top of workflows with PAW, so that users will have a user-friendly web environment to provide input parameters, execute the model, and visualize/retrieve the results using geospatial mapping tools. In future work the optimization model will be developed and integrated into the workflow.; Real-Time System for Water Modeling and Management
Workflow Management Systems for Molecular Dynamics on Leadership Computers
NASA Astrophysics Data System (ADS)
Wells, Jack; Panitkin, Sergey; Oleynik, Danila; Jha, Shantenu
Molecular Dynamics (MD) simulations play an important role in a range of disciplines from Material Science to Biophysical systems and account for a large fraction of cycles consumed on computing resources. Increasingly science problems require the successful execution of ''many'' MD simulations as opposed to a single MD simulation. There is a need to provide scalable and flexible approaches to the execution of the workload. We present preliminary results on the Titan computer at the Oak Ridge Leadership Computing Facility that demonstrate a general capability to manage workload execution agnostic of a specific MD simulation kernel or execution pattern, and in a manner that integrates disparate grid-based and supercomputing resources. Our results build upon our extensive experience of distributed workload management in the high-energy physics ATLAS project using PanDA (Production and Distributed Analysis System), coupled with recent conceptual advances in our understanding of workload management on heterogeneous resources. We will discuss how we will generalize these initial capabilities towards a more production level service on DOE leadership resources. This research is sponsored by US DOE/ASCR and used resources of the OLCF computing facility.
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores
NASA Astrophysics Data System (ADS)
Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas
2017-10-01
We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.
Applying Content Management to Automated Provenance Capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.
2008-04-10
Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
NASA Astrophysics Data System (ADS)
Rynge, M.; Juve, G.; Kinney, J.; Good, J.; Berriman, B.; Merrihew, A.; Deelman, E.
2014-05-01
In this paper, we describe how to leverage cloud resources to generate large-scale mosaics of the galactic plane in multiple wavelengths. Our goal is to generate a 16-wavelength infrared Atlas of the Galactic Plane at a common spatial sampling of 1 arcsec, processed so that they appear to have been measured with a single instrument. This will be achieved by using the Montage image mosaic engine process observations from the 2MASS, GLIMPSE, MIPSGAL, MSX and WISE datasets, over a wavelength range of 1 μm to 24 μm, and by using the Pegasus Workflow Management System for managing the workload. When complete, the Atlas will be made available to the community as a data product. We are generating images that cover ±180° in Galactic longitude and ±20° in Galactic latitude, to the extent permitted by the spatial coverage of each dataset. Each image will be 5°x5° in size (including an overlap of 1° with neighboring tiles), resulting in an atlas of 1,001 images. The final size will be about 50 TBs. This paper will focus on the computational challenges, solutions, and lessons learned in producing the Atlas. To manage the computation we are using the Pegasus Workflow Management System, a mature, highly fault-tolerant system now in release 4.2.2 that has found wide applicability across many science disciplines. A scientific workflow describes the dependencies between the tasks and in most cases the workflow is described as a directed acyclic graph, where the nodes are tasks and the edges denote the task dependencies. A defining property for a scientific workflow is that it manages data flow between tasks. Applied to the galactic plane project, each 5 by 5 mosaic is a Pegasus workflow. Pegasus is used to fetch the source images, execute the image mosaicking steps of Montage, and store the final outputs in a storage system. As these workflows are very I/O intensive, care has to be taken when choosing what infrastructure to execute the workflow on. In our setup, we choose to use dynamically provisioned compute clusters running on the Amazon Elastic Compute Cloud (EC2). All our instances are using the same base image, which is configured to come up as a master node by default. The master node is a central instance from where the workflow can be managed. Additional worker instances are provisioned and configured to accept work assignments from the master node. The system allows for adding/removing workers in an ad hoc fashion, and could be run in large configurations. To-date we have performed 245,000 CPU hours of computing and generated 7,029 images and totaling 30 TB. With the current set up our runtime would be 340,000 CPU hours for the whole project. Using spot m2.4xlarge instances, the cost would be approximately $5,950. Using faster AWS instances, such as cc2.8xlarge could potentially decrease the total CPU hours and further reduce the compute costs. The paper will explore these tradeoffs.
Implementation of Cyberinfrastructure and Data Management Workflow for a Large-Scale Sensor Network
NASA Astrophysics Data System (ADS)
Jones, A. S.; Horsburgh, J. S.
2014-12-01
Monitoring with in situ environmental sensors and other forms of field-based observation presents many challenges for data management, particularly for large-scale networks consisting of multiple sites, sensors, and personnel. The availability and utility of these data in addressing scientific questions relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into functional data products. It also depends on the ability of researchers to share and access the data in useable formats. In addition to addressing the challenges presented by the quantity of data, monitoring networks need practices to ensure high data quality, including procedures and tools for post processing. Data quality is further enhanced if practitioners are able to track equipment, deployments, calibrations, and other events related to site maintenance and associate these details with observational data. In this presentation we will describe the overall workflow that we have developed for research groups and sites conducting long term monitoring using in situ sensors. Features of the workflow include: software tools to automate the transfer of data from field sites to databases, a Python-based program for data quality control post-processing, a web-based application for online discovery and visualization of data, and a data model and web interface for managing physical infrastructure. By automating the data management workflow, the time from collection to analysis is reduced and sharing and publication is facilitated. The incorporation of metadata standards and descriptions and the use of open-source tools enhances the sustainability and reusability of the data. We will describe the workflow and tools that we have developed in the context of the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) monitoring network. The iUTAH network consists of aquatic and climate sensors deployed in three watersheds to monitor Gradients Along Mountain to Urban Transitions (GAMUT). The variety of environmental sensors and the multi-watershed, multi-institutional nature of the network necessitate a well-planned and efficient workflow for acquiring, managing, and sharing sensor data, which should be useful for similar large-scale and long-term networks.
Scientific Workflows and the Sensor Web for Virtual Environmental Observatories
NASA Astrophysics Data System (ADS)
Simonis, I.; Vahed, A.
2008-12-01
Virtual observatories mature from their original domain and become common practice for earth observation research and policy building. The term Virtual Observatory originally came from the astronomical research community. Here, virtual observatories provide universal access to the available astronomical data archives of space and ground-based observatories. Further on, as those virtual observatories aim at integrating heterogeneous ressources provided by a number of participating organizations, the virtual observatory acts as a coordinating entity that strives for common data analysis techniques and tools based on common standards. The Sensor Web is on its way to become one of the major virtual observatories outside of the astronomical research community. Like the original observatory that consists of a number of telescopes, each observing a specific part of the wave spectrum and with a collection of astronomical instruments, the Sensor Web provides a multi-eyes perspective on the current, past, as well as future situation of our planet and its surrounding spheres. The current view of the Sensor Web is that of a single worldwide collaborative, coherent, consistent and consolidated sensor data collection, fusion and distribution system. The Sensor Web can perform as an extensive monitoring and sensing system that provides timely, comprehensive, continuous and multi-mode observations. This technology is key to monitoring and understanding our natural environment, including key areas such as climate change, biodiversity, or natural disasters on local, regional, and global scales. The Sensor Web concept has been well established with ongoing global research and deployment of Sensor Web middleware and standards and represents the foundation layer of systems like the Global Earth Observation System of Systems (GEOSS). The Sensor Web consists of a huge variety of physical and virtual sensors as well as observational data, made available on the Internet at standardized interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.
Chao, Tian-Jy; Kim, Younghun
2015-02-10
An end-to-end interoperability and workflows from building architecture design to one or more simulations, in one aspect, may comprise establishing a BIM enablement platform architecture. A data model defines data entities and entity relationships for enabling the interoperability and workflows. A data definition language may be implemented that defines and creates a table schema of a database associated with the data model. Data management services and/or application programming interfaces may be implemented for interacting with the data model. Web services may also be provided for interacting with the data model via the Web. A user interface may be implemented that communicates with users and uses the BIM enablement platform architecture, the data model, the data definition language, data management services and application programming interfaces to provide functions to the users to perform work related to building information management.
Identification and Management of Information Problems by Emergency Department Staff
Murphy, Alison R.; Reddy, Madhu C.
2014-01-01
Patient-care teams frequently encounter information problems during their daily activities. These information problems include wrong, outdated, conflicting, incomplete, or missing information. Information problems can negatively impact the patient-care workflow, lead to misunderstandings about patient information, and potentially lead to medical errors. Existing research focuses on understanding the cause of these information problems and the impact that they can have on the hospital’s workflow. However, there is limited research on how patient-care teams currently identify and manage information problems that they encounter during their work. Through qualitative observations and interviews in an emergency department (ED), we identified the types of information problems encountered by ED staff, and examined how they identified and managed the information problems. We also discuss the impact that these information problems can have on the patient-care teams, including the cascading effects of information problems on workflow and the ambiguous accountability for fixing information problems within collaborative teams. PMID:25954457
a Real-Time GIS Platform for High Sour Gas Leakage Simulation, Evaluation and Visualization
NASA Astrophysics Data System (ADS)
Li, M.; Liu, H.; Yang, C.
2015-07-01
The development of high-sulfur gas fields, also known as sour gas field, is faced with a series of safety control and emergency management problems. The GIS-based emergency response system is placed high expectations under the consideration of high pressure, high content, complex terrain and highly density population in Sichuan Basin, southwest China. The most researches on high hydrogen sulphide gas dispersion simulation and evaluation are used for environmental impact assessment (EIA) or emergency preparedness planning. This paper introduces a real-time GIS platform for high-sulfur gas emergency response. Combining with real-time data from the leak detection systems and the meteorological monitoring stations, GIS platform provides the functions of simulating, evaluating and displaying of the different spatial-temporal toxic gas distribution patterns and evaluation results. This paper firstly proposes the architecture of Emergency Response/Management System, secondly explains EPA's Gaussian dispersion model CALPUFF simulation workflow under high complex terrain and real-time data, thirdly explains the emergency workflow and spatial analysis functions of computing the accident influencing areas, population and the optimal evacuation routes. Finally, a well blow scenarios is used for verify the system. The study shows that GIS platform which integrates the real-time data and CALPUFF models will be one of the essential operational platforms for high-sulfur gas fields emergency management.
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...
2016-10-06
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
A Workflow to Investigate Exposure and Pharmacokinetic ...
Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowledge on exposure, along with absorption, distribution, metabolism, and excretion (ADME) properties of chemicals.Objectives: We developed a conceptual workflow to examine exposure and ADME properties in relation to an MIE. The utility of this workflow was evaluated using a previously established AOP, acetylcholinesterase (AChE) inhibition.Methods: Thirty chemicals found to inhibit human AChE in the ToxCast™ assay were examined with respect to their exposure, absorption potential, and ability to cross the blood–brain barrier (BBB). Structures of active chemicals were compared against structures of 1,029 inactive chemicals to detect possible parent compounds that might have active metabolites.Results: Application of the workflow screened 10 “low-priority” chemicals of 30 active chemicals. Fifty-two of the 1,029 inactive chemicals exhibited a similarity threshold of ≥ 75% with their nearest active neighbors. Of these 52 compounds, 30 were excluded due to poor absorption or distribution. The remaining 22 compounds may inhibit AChE in vivo either directly or as a result of metabolic activation.Conclusions: The incorporation of exposure and ADME properties into the conceptual workflow e
Grid infrastructure for automatic processing of SAR data for flood applications
NASA Astrophysics Data System (ADS)
Kussul, Natalia; Skakun, Serhiy; Shelestov, Andrii
2010-05-01
More and more geosciences applications are being put on to the Grids. Due to the complexity of geosciences applications that is caused by complex workflow, the use of computationally intensive environmental models, the need of management and integration of heterogeneous data sets, Grid offers solutions to tackle these problems. Many geosciences applications, especially those related to the disaster management and mitigations require the geospatial services to be delivered in proper time. For example, information on flooded areas should be provided to corresponding organizations (local authorities, civil protection agencies, UN agencies etc.) no more than in 24 h to be able to effectively allocate resources required to mitigate the disaster. Therefore, providing infrastructure and services that will enable automatic generation of products based on the integration of heterogeneous data represents the tasks of great importance. In this paper we present Grid infrastructure for automatic processing of synthetic-aperture radar (SAR) satellite images to derive flood products. In particular, we use SAR data acquired by ESA's ENVSAT satellite, and neural networks to derive flood extent. The data are provided in operational mode from ESA rolling archive (within ESA Category-1 grant). We developed a portal that is based on OpenLayers frameworks and provides access point to the developed services. Through the portal the user can define geographical region and search for the required data. Upon selection of data sets a workflow is automatically generated and executed on the resources of Grid infrastructure. For workflow execution and management we use Karajan language. The workflow of SAR data processing consists of the following steps: image calibration, image orthorectification, image processing with neural networks, topographic effects removal, geocoding and transformation to lat/long projection, and visualisation. These steps are executed by different software, and can be executed by different resources of the Grid system. The resulting geospatial services are available in various OGC standards such as KML and WMS. Currently, the Grid infrastructure integrates the resources of several geographically distributed organizations, in particular: Space Research Institute NASU-NSAU (Ukraine) with deployed computational and storage nodes based on Globus Toolkit 4 (htpp://www.globus.org) and gLite 3 (http://glite.web.cern.ch) middleware, access to geospatial data and a Grid portal; Institute of Cybernetics of NASU (Ukraine) with deployed computational and storage nodes (SCIT-1/2/3 clusters) based on Globus Toolkit 4 middleware and access to computational resources (approximately 500 processors); Center of Earth Observation and Digital Earth Chinese Academy of Sciences (CEODE-CAS, China) with deployed computational nodes based on Globus Toolkit 4 middleware and access to geospatial data (approximately 16 processors). We are currently adding new geospatial services based on optical satellite data, namely MODIS. This work is carried out jointly with the CEODE-CAS. Using workflow patterns that were developed for SAR data processing we are building new workflows for optical data processing.
NASA Astrophysics Data System (ADS)
Swetnam, T. L.; Pelletier, J. D.; Merchant, N.; Callahan, N.; Lyons, E.
2015-12-01
Earth science is making rapid advances through effective utilization of large-scale data repositories such as aerial LiDAR and access to NSF-funded cyberinfrastructures (e.g. the OpenTopography.org data portal, iPlant Collaborative, and XSEDE). Scaling analysis tasks that are traditionally developed using desktops, laptops or computing clusters to effectively leverage national and regional scale cyberinfrastructure pose unique challenges and barriers to adoption. To address some of these challenges in Fall 2014 an 'Applied Cyberinfrastructure Concepts' a project-based learning course (ISTA 420/520) at the University of Arizona focused on developing scalable models of 'Effective Energy and Mass Transfer' (EEMT, MJ m-2 yr-1) for use by the NSF Critical Zone Observatories (CZO) project. EEMT is a quantitative measure of the flux of available energy to the critical zone, and its computation involves inputs that have broad applicability (e.g. solar insolation). The course comprised of 25 students with varying level of computational skills and with no prior domain background in the geosciences, collaborated with domain experts to develop the scalable workflow. The original workflow relying on open-source QGIS platform on a laptop was scaled to effectively utilize cloud environments (Openstack), UA Campus HPC systems, iRODS, and other XSEDE and OSG resources. The project utilizes public data, e.g. DEMs produced by OpenTopography.org and climate data from Daymet, which are processed using GDAL, GRASS and SAGA and the Makeflow and Work-queue task management software packages. Students were placed into collaborative groups to develop the separate aspects of the project. They were allowed to change teams, alter workflows, and design and develop novel code. The students were able to identify all necessary dependencies, recompile source onto the target execution platforms, and demonstrate a functional workflow, which was further improved upon by one of the group leaders over Spring 2015. All of the code, documentation and workflow description are currently available on GitHub and a public data portal is in development. We present a case study of how students reacted to the challenge of a real science problem, their interactions with end-users, what went right, and what could be done better in the future.
Lessons from Implementing a Combined Workflow–Informatics System for Diabetes Management
Zai, Adrian H.; Grant, Richard W.; Estey, Greg; Lester, William T.; Andrews, Carl T.; Yee, Ronnie; Mort, Elizabeth; Chueh, Henry C.
2008-01-01
Shortcomings surrounding the care of patients with diabetes have been attributed largely to a fragmented, disorganized, and duplicative health care system that focuses more on acute conditions and complications than on managing chronic disease. To address these shortcomings, we developed a diabetes registry population management application to change the way our staff manages patients with diabetes. Use of this new application has helped us coordinate the responsibilities for intervening and monitoring patients in the registry among different users. Our experiences using this combined workflow-informatics intervention system suggest that integrating a chronic disease registry into clinical workflow for the treatment of chronic conditions creates a useful and efficient tool for managing disease. PMID:18436907
The distributed production system of the SuperB project: description and results
NASA Astrophysics Data System (ADS)
Brown, D.; Corvo, M.; Di Simone, A.; Fella, A.; Luppi, E.; Paoloni, E.; Stroili, R.; Tomassetti, L.
2011-12-01
The SuperB experiment needs large samples of MonteCarlo simulated events in order to finalize the detector design and to estimate the data analysis performances. The requirements are beyond the capabilities of a single computing farm, so a distributed production model capable of exploiting the existing HEP worldwide distributed computing infrastructure is needed. In this paper we describe the set of tools that have been developed to manage the production of the required simulated events. The production of events follows three main phases: distribution of input data files to the remote site Storage Elements (SE); job submission, via SuperB GANGA interface, to all available remote sites; output files transfer to CNAF repository. The job workflow includes procedures for consistency checking, monitoring, data handling and bookkeeping. A replication mechanism allows storing the job output on the local site SE. Results from 2010 official productions are reported.
A pattern-based analysis of clinical computer-interpretable guideline modeling languages.
Mulyar, Nataliya; van der Aalst, Wil M P; Peleg, Mor
2007-01-01
Languages used to specify computer-interpretable guidelines (CIGs) differ in their approaches to addressing particular modeling challenges. The main goals of this article are: (1) to examine the expressive power of CIG modeling languages, and (2) to define the differences, from the control-flow perspective, between process languages in workflow management systems and modeling languages used to design clinical guidelines. The pattern-based analysis was applied to guideline modeling languages Asbru, EON, GLIF, and PROforma. We focused on control-flow and left other perspectives out of consideration. We evaluated the selected CIG modeling languages and identified their degree of support of 43 control-flow patterns. We used a set of explicitly defined evaluation criteria to determine whether each pattern is supported directly, indirectly, or not at all. PROforma offers direct support for 22 of 43 patterns, Asbru 20, GLIF 17, and EON 11. All four directly support basic control-flow patterns, cancellation patterns, and some advance branching and synchronization patterns. None support multiple instances patterns. They offer varying levels of support for synchronizing merge patterns and state-based patterns. Some support a few scenarios not covered by the 43 control-flow patterns. CIG modeling languages are remarkably close to traditional workflow languages from the control-flow perspective, but cover many fewer workflow patterns. CIG languages offer some flexibility that supports modeling of complex decisions and provide ways for modeling some decisions not covered by workflow management systems. Workflow management systems may be suitable for clinical guideline applications.
Common Data Models and Efficient Reproducible Workflows for Distributed Ocean Model Skill Assessment
NASA Astrophysics Data System (ADS)
Signell, R. P.; Snowden, D. P.; Howlett, E.; Fernandes, F. A.
2014-12-01
Model skill assessment requires discovery, access, analysis, and visualization of information from both sensors and models, and traditionally has been possible only by a few experts. The US Integrated Ocean Observing System (US-IOOS) consists of 17 Federal Agencies and 11 Regional Associations that produce data from various sensors and numerical models; exactly the information required for model skill assessment. US-IOOS is seeking to develop documented skill assessment workflows that are standardized, efficient, and reproducible so that a much wider community can participate in the use and assessment of model results. Standardization requires common data models for observational and model data. US-IOOS relies on the CF Conventions for observations and structured grid data, and on the UGRID Conventions for unstructured (e.g. triangular) grid data. This allows applications to obtain only the data they require in a uniform and parsimonious way using web services: OPeNDAP for model output and OGC Sensor Observation Service (SOS) for observed data. Reproducibility is enabled with IPython Notebooks shared on GitHub (http://github.com/ioos). These capture the entire skill assessment workflow, including user input, search, access, analysis, and visualization, ensuring that workflows are self-documenting and reproducible by anyone, using free software. Python packages for common data models are Pyugrid and the British Met Office Iris package. Python packages required to run the workflows (pyugrid, pyoos, and the British Met Office Iris package) are also available on GitHub and on Binstar.org so that users can run scenarios using the free Anaconda Python distribution. Hosted services such as Wakari enable anyone to reproduce these workflows for free, without installing any software locally, using just their web browser. We are also experimenting with Wakari Enterprise, which allows multi-user access from a web browser to an IPython Server running where large quantities of model output reside, increasing the efficiency. The open development and distribution of these workflows, and the software on which they depend, is an educational resource for those new to the field and a center of focus where practitioners can contribute new software and ideas.
Jaschob, Daniel; Riffle, Michael
2012-07-30
Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. JobCenter is a client-server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or "in the cloud") and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/.
Monitoring data transfer latency in CMS computing operations
Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo; ...
2015-12-23
During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less
Monitoring data transfer latency in CMS computing operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo
During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less
Automated quality control in a file-based broadcasting workflow
NASA Astrophysics Data System (ADS)
Zhang, Lina
2014-04-01
Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.
Streamling the Change Management with Business Rules
NASA Technical Reports Server (NTRS)
Savela, Christopher
2015-01-01
Will discuss how their organization is trying to streamline workflows and the change management process with business rules. In looking for ways to make things more efficient and save money one way is to reduce the work the workflow task approvers have to do when reviewing affected items. Will share the technical details of the business rules, how to implement them, how to speed up the development process by using the API to demonstrate the rules in action.
Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life
Thessen, Anne E.; Parr, Cynthia Sims
2014-01-01
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics. PMID:24594988
Using aerial images for establishing a workflow for the quantification of water management measures
NASA Astrophysics Data System (ADS)
Leuschner, Annette; Merz, Christoph; van Gasselt, Stephan; Steidl, Jörg
2017-04-01
Quantified landscape characteristics, such as morphology, land use or hydrological conditions, play an important role for hydrological investigations as landscape parameters directly control the overall water balance. A powerful assimilation and geospatial analysis of remote sensing datasets in combination with hydrological modeling allows to quantify landscape parameters and water balances efficiently. This study focuses on the development of a workflow to extract hydrologically relevant data from aerial image datasets and derived products in order to allow an effective parametrization of a hydrological model. Consistent and self-contained data source are indispensable for achieving reasonable modeling results. In order to minimize uncertainties and inconsistencies, input parameters for modeling should be extracted from one remote-sensing dataset mainly if possbile. Here, aerial images have been chosen because of their high spatial and spectral resolution that permits the extraction of various model relevant parameters, like morphology, land-use or artificial drainage-systems. The methodological repertoire to extract environmental parameters range from analyses of digital terrain models, multispectral classification and segmentation of land use distribution maps and mapping of artificial drainage-systems based on spectral and visual inspection. The workflow has been tested for a mesoscale catchment area which forms a characteristic hydrological system of a young moraine landscape located in the state of Brandenburg, Germany. These dataset were used as input-dataset for multi-temporal hydrological modelling of water balances to detect and quantify anthropogenic and meteorological impacts. ArcSWAT, as a GIS-implemented extension and graphical user input interface for the Soil Water Assessment Tool (SWAT) was chosen. The results of this modeling approach provide the basis for anticipating future development of the hydrological system, and regarding system changes for the adaption of water resource management decisions.
COSMOS: Python library for massively parallel workflows
Gafni, Erik; Luquette, Lovelace J.; Lancaster, Alex K.; Hawkins, Jared B.; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter J.
2014-01-01
Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24982428
COSMOS: Python library for massively parallel workflows.
Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J
2014-10-15
Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Grid-based platform for training in Earth Observation
NASA Astrophysics Data System (ADS)
Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Frincu, Marc; Neagul, Marian; Gorgan, Dorian; Stefanut, Teodor
2010-05-01
GiSHEO platform [1] providing on-demand services for training and high education in Earth Observation is developed, in the frame of an ESA funded project through its PECS programme, to respond to the needs of powerful education resources in remote sensing field. It intends to be a Grid-based platform of which potential for experimentation and extensibility are the key benefits compared with a desktop software solution. Near-real time applications requiring simultaneous multiple short-time-response data-intensive tasks, as in the case of a short time training event, are the ones that are proved to be ideal for this platform. The platform is based on Globus Toolkit 4 facilities for security and process management, and on the clusters of four academic institutions involved in the project. The authorization uses a VOMS service. The main public services are the followings: the EO processing services (represented through special WSRF-type services); the workflow service exposing a particular workflow engine; the data indexing and discovery service for accessing the data management mechanisms; the processing services, a collection allowing easy access to the processing platform. The WSRF-type services for basic satellite image processing are reusing free image processing tools, OpenCV and GDAL. New algorithms and workflows were develop to tackle with challenging problems like detecting the underground remains of old fortifications, walls or houses. More details can be found in [2]. Composed services can be specified through workflows and are easy to be deployed. The workflow engine, OSyRIS (Orchestration System using a Rule based Inference Solution), is based on DROOLS, and a new rule-based workflow language, SILK (SImple Language for worKflow), has been built. Workflow creation in SILK can be done with or without a visual designing tools. The basics of SILK are the tasks and relations (rules) between them. It is similar with the SCUFL language, but not relying on XML in order to allow the introduction of more workflow specific issues. Moreover, an event-condition-action (ECA) approach allows a greater flexibility when expressing data and task dependencies, as well as the creation of adaptive workflows which can react to changes in the configuration of the Grid or in the workflow itself. Changes inside the grid are handled by creating specific rules which allow resource selection based on various task scheduling criteria. Modifications of the workflow are usually accomplished either by inserting or retracting at runtime rules belonging to it or by modifying the executor of the task in case a better one is found. The former implies changes in its structure while the latter does not necessarily mean changes of the resource but more precisely changes of the algorithm used for solving the task. More details can be found in [3]. Another important platform component is the data indexing and storage service, GDIS, providing features for data storage, indexing data using a specialized RDBMS, finding data by various conditions, querying external services and keeping track of temporary data generated by other components. The data storage component part of GDIS is responsible for storing the data by using available storage backends such as local disk file systems (ext3), local cluster storage (GFS) or distributed file systems (HDFS). A front-end GridFTP service is capable of interacting with the storage domains on behalf of the clients and in a uniform way and also enforces the security restrictions provided by other specialized services and related with data access. The data indexing is performed by PostGIS. An advanced and flexible interface for searching the project's geographical repository is built around a custom query language (LLQL - Lisp Like Query Language) designed to provide fine grained access to the data in the repository and to query external services (e.g. for exploiting the connection with GENESI-DR catalog). More details can be found in [4]. The Workload Management System (WMS) provides two types of resource managers. The first one will be based on Condor HTC and use Condor as a job manager for task dispatching and working nodes (for development purposes) while the second one will use GT4 GRAM (for production purposes). The WMS main component, the Grid Task Dispatcher (GTD), is responsible for the interaction with other internal services as the composition engine in order to facilitate access to the processing platform. Its main responsibilities are to receive tasks from the workflow engine or directly from user interface, to use a task description language (the ClassAd meta language in case of Condor HTC) for job units, to submit and check the status of jobs inside the workload management system and to retrieve job logs for debugging purposes. More details can be found in [4]. A particular component of the platform is eGLE, the eLearning environment. It provides the functionalities necessary to create the visual appearance of the lessons through the usage of visual containers like tools, patterns and templates. The teacher uses the platform for testing the already created lessons, as well as for developing new lesson resources, such as new images and workflows describing graph-based processing. The students execute the lessons or describe and experiment with new workflows or different data. The eGLE database includes several workflow-based lesson descriptions, teaching materials and lesson resources, selected satellite and spatial data. More details can be found in [5]. A first training event of using the platform was organized in September 2009 during 11th SYNASC symposium (links to the demos, testing interface, and exercises are available on project site [1]). The eGLE component was presented at 4th GPC conference in May 2009. Moreover, the functionality of the platform will be presented as demo in April 2010 at 5th EGEE User forum. References: [1] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [2] D. Petcu, D. Zaharie, M. Neagul, S. Panica, M. Frincu, D. Gorgan, T. Stefanut, V. Bacu, Remote Sensed Image Processing on Grids for Training in Earth Observation. In Image Processing, V. Kordic (ed.), In-Tech, January 2010. [3] M. Neagul, S. Panica, D. Petcu, D. Zaharie, D. Gorgan, Web and Grid Services for Training in Earth Observation, IDAACS 2009, IEEE Computer Press, 241-246 [4] M. Frincu, S. Panica, M. Neagul, D. Petcu, Gisheo: On Demand Grid Service Based Platform for EO Data Processing. HiperGrid 2009, Politehnica Press, 415-422. [5] D. Gorgan, T. Stefanut, V. Bacu, Grid Based Training Environment for Earth Observation, GPC 2009, LNCS 5529, 98-109
ERIC Educational Resources Information Center
Schmidt, Kari
2012-01-01
In this column, the author discusses how the management of e-books has introduced, at many libraries and in varying degrees, the challenges of maintaining effective technical services workflows. Four different e-book workflows are identified and explored, and the author takes a closer look at how particular variables for each are affected, such as…
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
Klimentov, A.; Buncic, P.; De, K.; ...
2015-05-22
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klimentov, A.; Buncic, P.; De, K.
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Tools for automated acoustic monitoring within the R package monitoR
Katz, Jonathan; Hafner, Sasha D.; Donovan, Therese
2016-01-01
The R package monitoR contains tools for managing an acoustic-monitoring program including survey metadata, template creation and manipulation, automated detection and results management. These tools are scalable for use with small projects as well as larger long-term projects and those with expansive spatial extents. Here, we describe typical workflow when using the tools in monitoR. Typical workflow utilizes a generic sequence of functions, with the option for either binary point matching or spectrogram cross-correlation detectors.
Moenninghoff, Christoph; Umutlu, Lale; Kloeters, Christian; Ringelstein, Adrian; Ladd, Mark E; Sombetzki, Antje; Lauenstein, Thomas C; Forsting, Michael; Schlamann, Marc
2013-06-01
Workflow efficiency and workload of radiological technologists (RTs) were compared in head examinations performed with two 1.5 T magnetic resonance (MR) scanners equipped with or without an automated user interface called "day optimizing throughput" (Dot) workflow engine. Thirty-four patients with known intracranial pathology were examined with a 1.5 T MR scanner with Dot workflow engine (Siemens MAGNETOM Aera) and with a 1.5 T MR scanner with conventional user interface (Siemens MAGNETOM Avanto) using four standardized examination protocols. The elapsed time for all necessary work steps, which were performed by 11 RTs within the total examination time, was compared for each examination at both MR scanners. The RTs evaluated the user-friendliness of both scanners by a questionnaire. Normality of distribution was checked for all continuous variables by use of the Shapiro-Wilk test. Normally distributed variables were analyzed by Student's paired t-test, otherwise Wilcoxon signed-rank test was used to compare means. Total examination time of MR examinations performed with Dot engine was reduced from 24:53 to 20:01 minutes (P < .001) and the necessary RT intervention decreased by 61% (P < .001). The Dot engine's automated choice of MR protocols was significantly better assessed by the RTs than the conventional user interface (P = .001). According to this preliminary study, the Dot workflow engine is a time-saving user assistance software, which decreases the RTs' effort significantly and may help to automate neuroradiological examinations for a higher workflow efficiency. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.
2007-09-01
Motion URL: http://www.blackberry.com/products/blackberry/index.shtml Software Name: Bricolage Company: Bricolage URL: http://www.bricolage.cc...Workflow Customizable control over editorial content. Bricolage Bricolage Feature Description Software Company Workflow Allows development...content for Nuxeo Collaborative Portal projects. Nuxeo Workspace Add, edit, delete, content through web interface. Bricolage Bricolage
The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)
NASA Astrophysics Data System (ADS)
Stephan, E.
2010-12-01
The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual success is tied together, forming a symbiotic relationship where research and development advances in one effort can provide tremendous benefits to the other. For example, automating provenance extraction within scientific applications is still a relatively new concept; the workflow engine provides the framework to capture application specific operations, inputs, and resulting data. It provides a description of the process history and data flow by wrapping workflow components around the applications and data sources. On the other hand, a lack of cooperation between workflows and provenance can inhibit usefulness of both to science. Blindly tracking the execution history without having a true understanding of what kinds of questions end users may have makes the provenance indecipherable to the target users. Over the past nine years PNNL has been actively involved in provenance research in support of computational chemistry, molecular dynamics, biology, hydrology, and climate. PNNL has also been actively involved in efforts by the international community to develop open standards for provenance and the development of architectures to support provenance capture, storage, and querying. This presentation will provide real world use cases of how provenance and workflow can be leveraged and implemented to meet different needs and the challenges that lie ahead.
A big data approach for climate change indicators processing in the CLIP-C project
NASA Astrophysics Data System (ADS)
D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni
2016-04-01
Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.
Zugaj, D; Chenet, A; Petit, L; Vaglio, J; Pascual, T; Piketty, C; Bourdes, V
2018-02-04
Currently, imaging technologies that can accurately assess or provide surrogate markers of the human cutaneous microvessel network are limited. Dynamic optical coherence tomography (D-OCT) allows the detection of blood flow in vivo and visualization of the skin microvasculature. However, image processing is necessary to correct images, filter artifacts, and exclude irrelevant signals. The objective of this study was to develop a novel image processing workflow to enhance the technical capabilities of D-OCT. Single-center, vehicle-controlled study including healthy volunteers aged 18-50 years. A capsaicin solution was applied topically on the subject's forearm to induce local inflammation. Measurements of capsaicin-induced increase in dermal blood flow, within the region of interest, were performed by laser Doppler imaging (LDI) (reference method) and D-OCT. Sixteen subjects were enrolled. A good correlation was shown between D-OCT and LDI, using the image processing workflow. Therefore, D-OCT offers an easy-to-use alternative to LDI, with good repeatability, new robust morphological features (dermal-epidermal junction localization), and quantification of the distribution of vessel size and changes in this distribution induced by capsaicin. The visualization of the vessel network was improved through bloc filtering and artifact removal. Moreover, the assessment of vessel size distribution allows a fine analysis of the vascular patterns. The newly developed image processing workflow enhances the technical capabilities of D-OCT for the accurate detection and characterization of microcirculation in the skin. A direct clinical application of this image processing workflow is the quantification of the effect of topical treatment on skin vascularization. © 2018 The Authors. Skin Research and Technology Published by John Wiley & Sons Ltd.
Extension of specification language for soundness and completeness of service workflow
NASA Astrophysics Data System (ADS)
Viriyasitavat, Wattana; Xu, Li Da; Bi, Zhuming; Sapsomboon, Assadaporn
2018-05-01
A Service Workflow is an aggregation of distributed services to fulfill specific functionalities. With ever increasing available services, the methodologies for the selections of the services against the given requirements become main research subjects in multiple disciplines. A few of researchers have contributed to the formal specification languages and the methods for model checking; however, existing methods have the difficulties to tackle with the complexity of workflow compositions. In this paper, we propose to formalize the specification language to reduce the complexity of the workflow composition. To this end, we extend a specification language with the consideration of formal logic, so that some effective theorems can be derived for the verification of syntax, semantics, and inference rules in the workflow composition. The logic-based approach automates compliance checking effectively. The Service Workflow Specification (SWSpec) has been extended and formulated, and the soundness, completeness, and consistency of SWSpec applications have been verified; note that a logic-based SWSpec is mandatory for the development of model checking. The application of the proposed SWSpec has been demonstrated by the examples with the addressed soundness, completeness, and consistency.
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.
2015-12-01
The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and migrating from file-based communication to MPI messaging, to greatly reduce the I/O demands and node-hour requirements of CyberShake. We will also present performance metrics from CyberShake Study 15.4, and discuss challenges that producers of Big Data on open-science HPC resources face moving forward.
NASA Astrophysics Data System (ADS)
Bastrakova, I.; Car, N.
2017-12-01
Geoscience Australia (GA) is recognised and respected as the National Repository and steward of multiple nationally significance data collections that provides geoscience information, services and capability to the Australian Government, industry and stakeholders. Internally, this brings a challenge of managing large volume (11 PB) of diverse and highly complex data distributed through a significant number of catalogues, applications, portals, virtual laboratories, and direct downloads from multiple locations. Externally, GA is facing constant changer in the Government regulations (e.g. open data and archival laws), growing stakeholder demands for high quality and near real-time delivery of data and products, and rapid technological advances enabling dynamic data access. Traditional approach to citing static data and products cannot satisfy increasing demands for the results from scientific workflows, or items within the workflows to be open, discoverable, thrusted and reproducible. Thus, citation of data, products, codes and applications through the implementation of provenance records is being implemented. This approach involves capturing the provenance of many GA processes according to a standardised data model and storing it, as well as metadata for the elements it references, in a searchable set of systems. This provides GA with ability to cite workflows unambiguously as well as each item within each workflow, including inputs and outputs and many other registered components. Dynamic objects can therefore be referenced flexibly in relation to their generation process - a dataset's metadata indicates where to obtain its provenance from - meaning the relevant facts of its dynamism need not be crammed into a single citation object with a single set of attributes. This allows for simple citations, similar to traditional static document citations such as references in journals, to be used for complex dynamic data and other objects such as software code.
A microseismic workflow for managing induced seismicity risk as CO 2 storage projects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzel, E.; Morency, C.; Pyle, M.
2015-10-27
It is well established that fluid injection has the potential to induce earthquakes—from microseismicity to large, damaging events—by altering state-of-stress conditions in the subsurface. While induced seismicity has not been a major operational issue for carbon storage projects to date, a seismicity hazard exists and must be carefully addressed. Two essential components of effective seismic risk management are (1) sensitive microseismic monitoring and (2) robust data interpretation tools. This report describes a novel workflow, based on advanced processing algorithms applied to microseismic data, to help improve management of seismic risk. This workflow has three main goals: (1) to improve themore » resolution and reliability of passive seismic monitoring, (2) to extract additional, valuable information from continuous waveform data that is often ignored in standard processing, and (3) to minimize the turn-around time between data collection, interpretation, and decision-making. These three objectives can allow for a better-informed and rapid response to changing subsurface conditions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pugmire, David; Kress, James; Choi, Jong
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
Implementing CORAL: An Electronic Resource Management System
ERIC Educational Resources Information Center
Whitfield, Sharon
2011-01-01
A 2010 electronic resource management survey conducted by Maria Collins of North Carolina State University and Jill E. Grogg of University of Alabama Libraries found that the top six electronic resources management priorities included workflow management, communications management, license management, statistics management, administrative…
NASA Astrophysics Data System (ADS)
Fiore, Sandro; Płóciennik, Marcin; Doutriaux, Charles; Blanquer, Ignacio; Barbera, Roberto; Donvito, Giacinto; Williams, Dean N.; Anantharaj, Valentine; Salomoni, Davide D.; Aloisio, Giovanni
2017-04-01
In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated, such as the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). A case study on climate models intercomparison data analysis addressing several classes of multi-model experiments is being implemented in the context of the EU H2020 INDIGO-DataCloud project. Such experiments require the availability of large amount of data (multi-terabyte order) related to the output of several climate models simulations as well as the exploitation of scientific data management tools for large-scale data analytics. More specifically, the talk discusses in detail a use case on precipitation trend analysis in terms of requirements, architectural design solution, and infrastructural implementation. The experiment has been tested and validated on CMIP5 datasets, in the context of a large scale distributed testbed across EU and US involving three ESGF sites (LLNL, ORNL, and CMCC) and one central orchestrator site (PSNC). The general "environment" of the case study relates to: (i) multi-model data analysis inter-comparison challenges; (ii) addressed on CMIP5 data; and (iii) which are made available through the IS-ENES/ESGF infrastructure. The added value of the solution proposed in the INDIGO-DataCloud project are summarized in the following: (i) it implements a different paradigm (from client- to server-side); (ii) it intrinsically reduces data movement; (iii) it makes lightweight the end-user setup; (iv) it fosters re-usability (of data, final/intermediate products, workflows, sessions, etc.) since everything is managed on the server-side; (v) it complements, extends and interoperates with the ESGF stack; (vi) it provides a "tool" for scientists to run multi-model experiments, and finally; and (vii) it can drastically reduce the time-to-solution for these experiments from weeks to hours. At the time the contribution is being written, the proposed testbed represents the first concrete implementation of a distributed multi-model experiment in the ESGF/CMIP context joining server-side and parallel processing, end-to-end workflow management and cloud computing. As opposed to the current scenario based on search & discovery, data download, and client-based data analysis, the INDIGO-DataCloud architectural solution described in this contribution addresses the scientific computing & analytics requirements by providing a paradigm shift based on server-side and high performance big data frameworks jointly with two-level workflow management systems realized at the PaaS level via a cloud infrastructure.
A framework for service enterprise workflow simulation with multi-agents cooperation
NASA Astrophysics Data System (ADS)
Tan, Wenan; Xu, Wei; Yang, Fujun; Xu, Lida; Jiang, Chuanqun
2013-11-01
Process dynamic modelling for service business is the key technique for Service-Oriented information systems and service business management, and the workflow model of business processes is the core part of service systems. Service business workflow simulation is the prevalent approach to be used for analysis of service business process dynamically. Generic method for service business workflow simulation is based on the discrete event queuing theory, which is lack of flexibility and scalability. In this paper, we propose a service workflow-oriented framework for the process simulation of service businesses using multi-agent cooperation to address the above issues. Social rationality of agent is introduced into the proposed framework. Adopting rationality as one social factor for decision-making strategies, a flexible scheduling for activity instances has been implemented. A system prototype has been developed to validate the proposed simulation framework through a business case study.
Solutions for Mining Distributed Scientific Data
NASA Astrophysics Data System (ADS)
Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.
2007-12-01
Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
Improving data collection, documentation, and workflow in a dementia screening study.
Read, Kevin B; LaPolla, Fred Willie Zametkin; Tolea, Magdalena I; Galvin, James E; Surkis, Alisa
2017-04-01
A clinical study team performing three multicultural dementia screening studies identified the need to improve data management practices and facilitate data sharing. A collaboration was initiated with librarians as part of the National Library of Medicine (NLM) informationist supplement program. The librarians identified areas for improvement in the studies' data collection, entry, and processing workflows. The librarians' role in this project was to meet needs expressed by the study team around improving data collection and processing workflows to increase study efficiency and ensure data quality. The librarians addressed the data collection, entry, and processing weaknesses through standardizing and renaming variables, creating an electronic data capture system using REDCap, and developing well-documented, reproducible data processing workflows. NLM informationist supplements provide librarians with valuable experience in collaborating with study teams to address their data needs. For this project, the librarians gained skills in project management, REDCap, and understanding of the challenges and specifics of a clinical research study. However, the time and effort required to provide targeted and intensive support for one study team was not scalable to the library's broader user community.
Scalable and cost-effective NGS genotyping in the cloud.
Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P
2015-10-15
While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
Identifying impact of software dependencies on replicability of biomedical workflows.
Miksa, Tomasz; Rauber, Andreas; Mina, Eleni
2016-12-01
Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright © 2016 Elsevier Inc. All rights reserved.
Flexible workflow sharing and execution services for e-scientists
NASA Astrophysics Data System (ADS)
Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely
2013-04-01
The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
A UIMA wrapper for the NCBO annotator.
Roeder, Christophe; Jonquet, Clement; Shah, Nigam H; Baumgartner, William A; Verspoor, Karin; Hunter, Lawrence
2010-07-15
The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows. This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
2016-10-06
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
Mathew, Cherian; Obst, Matthias; Vicario, Saverio; Haines, Robert; Williams, Alan R.; de Jong, Yde; Goble, Carole
2014-01-01
Abstract The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users. PMID:25535486
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-05-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Processing Approaches for DAS-Enabled Continuous Seismic Monitoring
NASA Astrophysics Data System (ADS)
Dou, S.; Wood, T.; Freifeld, B. M.; Robertson, M.; McDonald, S.; Pevzner, R.; Lindsey, N.; Gelvin, A.; Saari, S.; Morales, A.; Ekblaw, I.; Wagner, A. M.; Ulrich, C.; Daley, T. M.; Ajo Franklin, J. B.
2017-12-01
Distributed Acoustic Sensing (DAS) is creating a "field as laboratory" capability for seismic monitoring of subsurface changes. By providing unprecedented spatial and temporal sampling at a relatively low cost, DAS enables field-scale seismic monitoring to have durations and temporal resolutions that are comparable to those of laboratory experiments. Here we report on seismic processing approaches developed during data analyses of three case studies all using DAS-enabled seismic monitoring with applications ranging from shallow permafrost to deep reservoirs: (1) 10-hour downhole monitoring of cement curing at Otway, Australia; (2) 2-month surface monitoring of controlled permafrost thaw at Fairbanks, Alaska; (3) multi-month downhole and surface monitoring of carbon sequestration at Decatur, Illinois. We emphasize the data management and processing components relevant to DAS-based seismic monitoring, which include scalable approaches to data management, pre-processing, denoising, filtering, and wavefield decomposition. DAS has dramatically increased the data volume to the extent that terabyte-per-day data loads are now typical, straining conventional approaches to data storage and processing. To achieve more efficient use of disk space and network bandwidth, we explore improved file structures and data compression schemes. Because noise floor of DAS measurements is higher than that of conventional sensors, optimal processing workflow involving advanced denoising, deconvolution (of the source signatures), and stacking approaches are being established to maximize signal content of DAS data. The resulting workflow of data management and processing could accelerate the broader adaption of DAS for continuous monitoring of critical processes.
Towards Exascale Seismic Imaging and Inversion
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.
2015-12-01
Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
CMS distributed data analysis with CRAB3
NASA Astrophysics Data System (ADS)
Mascheroni, M.; Balcas, J.; Belforte, S.; Bockelman, B. P.; Hernandez, J. M.; Ciangottini, D.; Konstantinov, P. B.; Silva, J. M. D.; Ali, M. A. B. M.; Melo, A. M.; Riahi, H.; Tanasijczuk, A. J.; Yusli, M. N. B.; Wolf, M.; Woodard, A. E.; Vaandering, E.
2015-12-01
The CMS Remote Analysis Builder (CRAB) is a distributed workflow management tool which facilitates analysis tasks by isolating users from the technical details of the Grid infrastructure. Throughout LHC Run 1, CRAB has been successfully employed by an average of 350 distinct users each week executing about 200,000 jobs per day. CRAB has been significantly upgraded in order to face the new challenges posed by LHC Run 2. Components of the new system include 1) a lightweight client, 2) a central primary server which communicates with the clients through a REST interface, 3) secondary servers which manage user analysis tasks and submit jobs to the CMS resource provisioning system, and 4) a central service to asynchronously move user data from temporary storage in the execution site to the desired storage location. The new system improves the robustness, scalability and sustainability of the service. Here we provide an overview of the new system, operation, and user support, report on its current status, and identify lessons learned from the commissioning phase and production roll-out.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2016-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
2012-01-01
Background Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. Results JobCenter is a client–server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or “in the cloud”) and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. Conclusions JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/. PMID:22846423
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-11-28
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-03-01
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Shabo, Amnon; Peleg, Mor; Parimbelli, Enea; Quaglini, Silvana; Napolitano, Carlo
2016-12-07
Implementing a decision-support system within a healthcare organization requires integration of clinical domain knowledge with resource constraints. Computer-interpretable guidelines (CIG) are excellent instruments for addressing clinical aspects while business process management (BPM) languages and Workflow (Wf) engines manage the logistic organizational constraints. Our objective is the orchestration of all the relevant factors needed for a successful execution of patient's care pathways, especially when spanning the continuum of care, from acute to community or home care. We considered three strategies for integrating CIGs with organizational workflows: extending the CIG or BPM languages and their engines, or creating an interplay between them. We used the interplay approach to implement a set of use cases arising from a CIG implementation in the domain of Atrial Fibrillation. To provide a more scalable and standards-based solution, we explored the use of Cross-Enterprise Document Workflow Integration Profile. We describe our proof-of-concept implementation of five use cases. We utilized the Personal Health Record of the MobiGuide project to implement a loosely-coupled approach between the Activiti BPM engine and the Picard CIG engine. Changes in the PHR were detected by polling. IHE profiles were used to develop workflow documents that orchestrate cross-enterprise execution of cardioversion. Interplay between CIG and BPM engines can support orchestration of care flows within organizational settings.
Distributed Monitoring Infrastructure for Worldwide LHC Computing Grid
NASA Astrophysics Data System (ADS)
Andrade, P.; Babik, M.; Bhatt, K.; Chand, P.; Collados, D.; Duggal, V.; Fuente, P.; Hayashi, S.; Imamagic, E.; Joshi, P.; Kalmady, R.; Karnani, U.; Kumar, V.; Lapka, W.; Quick, R.; Tarragon, J.; Teige, S.; Triantafyllidis, C.
2012-12-01
The journey of a monitoring probe from its development phase to the moment its execution result is presented in an availability report is a complex process. It goes through multiple phases such as development, testing, integration, release, deployment, execution, data aggregation, computation, and reporting. Further, it involves people with different roles (developers, site managers, VO[1] managers, service managers, management), from different middleware providers (ARC[2], dCache[3], gLite[4], UNICORE[5] and VDT[6]), consortiums (WLCG[7], EMI[11], EGI[15], OSG[13]), and operational teams (GOC[16], OMB[8], OTAG[9], CSIRT[10]). The seamless harmonization of these distributed actors is in daily use for monitoring of the WLCG infrastructure. In this paper we describe the monitoring of the WLCG infrastructure from the operational perspective. We explain the complexity of the journey of a monitoring probe from its execution on a grid node to the visualization on the MyWLCG[27] portal where it is exposed to other clients. This monitoring workflow profits from the interoperability established between the SAM[19] and RSV[20] frameworks. We show how these two distributed structures are capable of uniting technologies and hiding the complexity around them, making them easy to be used by the community. Finally, the different supported deployment strategies, tailored not only for monitoring the entire infrastructure but also for monitoring sites and virtual organizations, are presented and the associated operational benefits highlighted.
Wilk, Szymon; Kezadri-Hamiaz, Mounira; Rosu, Daniela; Kuziemsky, Craig; Michalowski, Wojtek; Amyot, Daniel; Carrier, Marc
2016-02-01
In healthcare organizations, clinical workflows are executed by interdisciplinary healthcare teams (IHTs) that operate in ways that are difficult to manage. Responding to a need to support such teams, we designed and developed the MET4 multi-agent system that allows IHTs to manage patients according to presentation-specific clinical workflows. In this paper, we describe a significant extension of the MET4 system that allows for supporting rich team dynamics (understood as team formation, management and task-practitioner allocation), including selection and maintenance of the most responsible physician and more complex rules of selecting practitioners for the workflow tasks. In order to develop this extension, we introduced three semantic components: (1) a revised ontology describing concepts and relations pertinent to IHTs, workflows, and managed patients, (2) a set of behavioral rules describing the team dynamics, and (3) an instance base that stores facts corresponding to instances of concepts from the ontology and to relations between these instances. The semantic components are represented in first-order logic and they can be automatically processed using theorem proving and model finding techniques. We employ these techniques to find models that correspond to specific decisions controlling the dynamics of IHT. In the paper, we present the design of extended MET4 with a special focus on the new semantic components. We then describe its proof-of-concept implementation using the WADE multi-agent platform and the Z3 solver (theorem prover/model finder). We illustrate the main ideas discussed in the paper with a clinical scenario of an IHT managing a patient with chronic kidney disease.
Applications of the pipeline environment for visual informatics and genomics computations
2011-01-01
Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102
NASA Astrophysics Data System (ADS)
Barreiro, F. H.; Borodin, M.; De, K.; Golubkov, D.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Padolski, S.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn’t exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented “train” model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.
Enterprise-wide worklist management.
Locko, Roberta C; Blume, Hartwig; Goble, John C
2002-01-01
Radiologists in multi-facility health care delivery networks must serve not only their own departments but also departments of associated clinical facilities. We describe our experience with a picture archiving and communication system (PACS) implementation that provides a dynamic view of relevant radiological workload across multiple facilities. We implemented a distributed query system that permits management of enterprise worklists based on modality, body part, exam status, and other criteria that span multiple compatible PACSs. Dynamic worklists, with lesser flexibility, can be constructed if the incompatible PACSs support specific DICOM functionality. Enterprise-wide worklists were implemented across Generations Plus/Northern Manhattan Health Network, linking radiology departments of three hospitals (Harlem, Lincoln, and Metropolitan) with 1465 beds and 4260 ambulatory patients per day. Enterprise-wide, dynamic worklist management improves utilization of radiologists and enhances the quality of care across large multi-facility health care delivery organizations. Integration of other workflow-related components remain a significant challenge.
An ontological knowledge framework for adaptive medical workflow.
Dang, Jiangbo; Hedayati, Amir; Hampel, Ken; Toklu, Candemir
2008-10-01
As emerging technologies, semantic Web and SOA (Service-Oriented Architecture) allow BPMS (Business Process Management System) to automate business processes that can be described as services, which in turn can be used to wrap existing enterprise applications. BPMS provides tools and methodologies to compose Web services that can be executed as business processes and monitored by BPM (Business Process Management) consoles. Ontologies are a formal declarative knowledge representation model. It provides a foundation upon which machine understandable knowledge can be obtained, and as a result, it makes machine intelligence possible. Healthcare systems can adopt these technologies to make them ubiquitous, adaptive, and intelligent, and then serve patients better. This paper presents an ontological knowledge framework that covers healthcare domains that a hospital encompasses-from the medical or administrative tasks, to hospital assets, medical insurances, patient records, drugs, and regulations. Therefore, our ontology makes our vision of personalized healthcare possible by capturing all necessary knowledge for a complex personalized healthcare scenario involving patient care, insurance policies, and drug prescriptions, and compliances. For example, our ontology facilitates a workflow management system to allow users, from physicians to administrative assistants, to manage, even create context-aware new medical workflows and execute them on-the-fly.
Task Management in the New ATLAS Production System
NASA Astrophysics Data System (ADS)
De, K.; Golubkov, D.; Klimentov, A.; Potekhin, M.; Vaniachine, A.; Atlas Collaboration
2014-06-01
This document describes the design of the new Production System of the ATLAS experiment at the LHC [1]. The Production System is the top level workflow manager which translates physicists' needs for production level processing and analysis into actual workflows executed across over a hundred Grid sites used globally by ATLAS. As the production workload increased in volume and complexity in recent years (the ATLAS production tasks count is above one million, with each task containing hundreds or thousands of jobs) there is a need to upgrade the Production System to meet the challenging requirements of the next LHC run while minimizing the operating costs. In the new design, the main subsystems are the Database Engine for Tasks (DEFT) and the Job Execution and Definition Interface (JEDI). Based on users' requests, DEFT manages inter-dependent groups of tasks (Meta-Tasks) and generates corresponding data processing workflows. The JEDI component then dynamically translates the task definitions from DEFT into actual workload jobs executed in the PanDA Workload Management System [2]. We present the requirements, design parameters, basics of the object model and concrete solutions utilized in building the new Production System and its components.
Lange, Karin; Ziegler, Ralph; Neu, Andreas; Reinehr, Thomas; Daab, Iris; Walz, Marion; Maraun, Michael; Schnell, Oliver; Kulzer, Bernhard; Reichel, Andreas; Heinemann, Lutz; Parkin, Christopher G; Haak, Thomas
2015-03-01
Use of continuous subcutaneous insulin infusion (CSII) therapy improves glycemic control, reduces hypoglycemia and increases treatment satisfaction in individuals with diabetes. As a number of patient- and clinician-related factors can hinder the effectiveness and optimal usage of CSII therapy, new approaches are needed to address these obstacles. Ceriello and colleagues recently proposed a model of care that incorporates the collaborative use of structured SMBG into a formal approach to personalized diabetes management within all diabetes populations. We adapted this model for use in CSII-treated patients in order to enable the implementation of a workflow structure that enhances patient-physician communication and supports patients' diabetes self-management skills. We recognize that time constraints and current reimbursement policies pose significant challenges to healthcare providers integrating the Personalised Diabetes Management (PDM) process into clinical practice. We believe, however, that the time invested in modifying practice workflow and learning to apply the various steps of the PDM process will be offset by improved workflow and more effective patient consultations. This article describes how to implement PDM into clinical practice as a systematic, standardized process that can optimize CSII therapy.
An ontology-based framework for bioinformatics workflows.
Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer
2007-01-01
The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
NASA Astrophysics Data System (ADS)
Peer, Regina; Peer, Siegfried; Sander, Heike; Marsolek, Ingo; Koller, Wolfgang; Pappert, Dirk; Hierholzer, Johannes
2002-05-01
If new technology is introduced into medical practice it must prove to make a difference. However traditional approaches of outcome analysis failed to show a direct benefit of PACS on patient care and economical benefits are still in debate. A participatory process analysis was performed to compare workflow in a film based hospital and a PACS environment. This included direct observation of work processes, interview of involved staff, structural analysis and discussion of observations with staff members. After definition of common structures strong and weak workflow steps were evaluated. With a common workflow structure in both hospitals, benefits of PACS were revealed in workflow steps related to image reporting with simultaneous image access for ICU-physicians and radiologists, archiving of images as well as image and report distribution. However PACS alone is not able to cover the complete process of 'radiography for intensive care' from ordering of an image till provision of the final product equals image + report. Interference of electronic workflow with analogue process steps such as paper based ordering reduces the potential benefits of PACS. In this regard workflow modeling proved to be very helpful for the evaluation of complex work processes linking radiology and the ICU.
Teaching Workflow Analysis and Lean Thinking via Simulation: A Formative Evaluation
Campbell, Robert James; Gantt, Laura; Congdon, Tamara
2009-01-01
This article presents the rationale for the design and development of a video simulation used to teach lean thinking and workflow analysis to health services and health information management students enrolled in a course on the management of health information. The discussion includes a description of the design process, a brief history of the use of simulation in healthcare, and an explanation of how video simulation can be used to generate experiential learning environments. Based on the results of a survey given to 75 students as part of a formative evaluation, the video simulation was judged effective because it allowed students to visualize a real-world process (concrete experience), contemplate the scenes depicted in the video along with the concepts presented in class in a risk-free environment (reflection), develop hypotheses about why problems occurred in the workflow process (abstract conceptualization), and develop solutions to redesign a selected process (active experimentation). PMID:19412533
Jflow: a workflow management system for web applications.
Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe
2016-02-01
Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center are working on a multi-year Collaborative Research and Development Agreement.With the knowledge developed in the first year on how to provision and manage a federation of virtual machines through Cloud management systems. In this second year, we expanded the work on provisioning and federation, increasing both scale and diversity of solutions, and we started to build on-demand services on the established fabric, introducing the paradigm of Platform as a Service to assist with the execution of scientific workflows. We have enabled scientific workflows ofmore » stakeholders to run on multiple cloud resources at the scale of 1,000 concurrent machines. The demonstrations have been in the areas of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federation of Cloud Resources, and (c) On-demand Services for ScientificWorkflows.« less
Architectures Toward Reusable Science Data Systems
NASA Astrophysics Data System (ADS)
Moses, J. F.
2014-12-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building ground systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research, NOAA's weather satellites and USGS's Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience the goal is to recognize architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Architectures Toward Reusable Science Data Systems
NASA Technical Reports Server (NTRS)
Moses, John
2015-01-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research and NOAAs Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience we expect to find architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Towbin, Alexander J; Hall, Seth; Moskovitz, Jay; Johnson, Neil D; Donnelly, Lane F
2011-01-01
Communication of acute or critical results between the radiology department and referring clinicians has been a deficiency of many radiology departments. The failure to perform or document these communications can lead to poor patient care, patient safety issues, medical-legal issues, and complaints from referring clinicians. To mitigate these factors, a communication and documentation tool was created and incorporated into our departmental customer service program. This article will describe the implementation of a comprehensive customer service program in a hospital-based radiology department. A comprehensive customer service program was created in the radiology department. Customer service representatives were hired to answer the telephone calls to the radiology reading rooms and to help convey radiology results. The radiologists, referring clinicians, and customer service representatives were then linked via a novel workflow management system. This workflow management system provided tools to help facilitate the communication needs of each group. The number of studies with results conveyed was recorded from the implementation of the workflow management system. Between the implementation of the workflow management system on August 1, 2005, and June 1, 2009, 116,844 radiology results were conveyed to the referring clinicians and documented in the system. This accounts for more than 14% of the 828,516 radiology cases performed in this time frame. We have been successful in creating a comprehensive customer service program to convey and document communication of radiology results. This program has been widely used by the ordering clinicians as well as radiologists since its inception.
Multi-core processing and scheduling performance in CMS
NASA Astrophysics Data System (ADS)
Hernández, J. M.; Evans, D.; Foulkes, S.
2012-12-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
The VERCE platform: Enabling Computational Seismology via Streaming Workflows and Science Gateways
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Filgueira, Rosa; Krause, Amrey; Matser, Jonas; Casarotti, Emanuele; Magnoni, Federica; Gemund, Andre; Frobert, Laurent; Krischer, Lion; Atkinson, Malcolm
2015-04-01
The VERCE project is creating an e-Science platform to facilitate innovative data analysis and coding methods that fully exploit the wealth of data in global seismology. One of the technologies developed within the project is the Dispel4Py python library, which allows to describe abstract stream-based workflows for data-intensive applications and to execute them in a distributed environment. At runtime Dispel4Py is able to map workflow descriptions dynamically onto a number of computational resources (Apache Storm clusters, MPI powered clusters, and shared-memory multi-core machines, single-core machines), setting it apart from other workflow frameworks. Therefore, Dispel4Py enables scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Among the workflows developed with Dispel4Py in VERCE, we mention here those for Seismic Ambient Noise Cross-Correlation and MISFIT calculation, which address two data-intensive problems that are common in computational seismology. The former, also called Passive Imaging, allows the detection of relative seismic-wave velocity variations during the time of recording, to be associated with the stress-field changes that occurred in the test area. The MISFIT instead, takes as input the synthetic seismograms generated from HPC simulations for a certain Earth model and earthquake and, after a preprocessing stage, compares them with real observations in order to foster subsequent model updates and improvement (Inversion). The VERCE Science Gateway exposes the MISFIT calculation workflow as a service, in combination with the simulation phase. Both phases can be configured, controlled and monitored by the user via a rich user interface which is integrated within the gUSE Science Gateway framework, hiding the complexity of accessing third parties data services, security mechanisms and enactment on the target resources. Thanks to a modular extension to the Dispel4Py framework, the system collects provenance data adopting the W3C-PROV data model. Provenance recordings can be explored and analysed at run time for rapid diagnostic and workflow steering, or later for further validation and comparisons across runs. We will illustrate the interactive services of the gateway and the capabilities of the produced metadata, coupled with the VERCE data management layer based on iRODS. The Cross-Correlation workflow was evaluated on SuperMUC, a supercomputing cluster at the Leibniz Supercomputing Centre in Munich, with 155,656 processor cores in 9400 compute nodes. SuperMUC is based on the Intel Xeon architecture consisting of 18 Thin Node Islands and one Fat Node Island. This work has only had access to the Thin Node Islands, which contain Sandy Bridge nodes, each having 16 cores and 32 GB of memory. In the evaluations we used 1000 stations, and we applied two types of methods (whiten and non-whiten) for pre-processing the data. The workflow was tested on a varying number of cores (16, 32, 64, 128, and 256 cores) using the MPI mapping of Dispel4Py. The results show that Dispel4Py is able to improve the performance by increasing the number of cores without changing the description of the workflow.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-12-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using “service casts” and “interest casts” (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH’s Mining Workflow Composer and the open-source Active BPEL engine, and JPL’s SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the “sociological” problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
ERIC Educational Resources Information Center
Litsey, Ryan; Harris, Rea; London, Jessie
2018-01-01
Library workflows are an area where repetitive stress can potentially reduce staff efficiency. Day to day activities that require a repetitive motion can bring about physical and psychological fatigue. For library managers, it is important to seek ways in which this type of repetitive stress can be alleviated while having the added benefit of…
Improving data collection, documentation, and workflow in a dementia screening study
Read, Kevin B.; LaPolla, Fred Willie Zametkin; Tolea, Magdalena I.; Galvin, James E.; Surkis, Alisa
2017-01-01
Background A clinical study team performing three multicultural dementia screening studies identified the need to improve data management practices and facilitate data sharing. A collaboration was initiated with librarians as part of the National Library of Medicine (NLM) informationist supplement program. The librarians identified areas for improvement in the studies’ data collection, entry, and processing workflows. Case Presentation The librarians’ role in this project was to meet needs expressed by the study team around improving data collection and processing workflows to increase study efficiency and ensure data quality. The librarians addressed the data collection, entry, and processing weaknesses through standardizing and renaming variables, creating an electronic data capture system using REDCap, and developing well-documented, reproducible data processing workflows. Conclusions NLM informationist supplements provide librarians with valuable experience in collaborating with study teams to address their data needs. For this project, the librarians gained skills in project management, REDCap, and understanding of the challenges and specifics of a clinical research study. However, the time and effort required to provide targeted and intensive support for one study team was not scalable to the library’s broader user community. PMID:28377680
Managing the CMS Data and Monte Carlo Processing during LHC Run 2
NASA Astrophysics Data System (ADS)
Wissing, C.;
2017-10-01
In order to cope with the challenges expected during the LHC Run 2 CMS put in a number of enhancements into the main software packages and the tools used for centrally managed processing. In the presentation we will highlight these improvements that allow CMS to deal with the increased trigger output rate, the increased pileup and the evolution in computing technology. The overall system aims at high flexibility, improved operational flexibility and largely automated procedures. The tight coupling of workflow classes to types of sites has been drastically relaxed. Reliable and high-performing networking between most of the computing sites and the successful deployment of a data-federation allow the execution of workflows using remote data access. That required the development of a largely automatized system to assign workflows and to handle necessary pre-staging of data. Another step towards flexibility has been the introduction of one large global HTCondor Pool for all types of processing workflows and analysis jobs. Besides classical Grid resources also some opportunistic resources as well as Cloud resources have been integrated into that Pool, which gives reach to more than 200k CPU cores.
Morris, Chris; Pajon, Anne; Griffiths, Susanne L.; Daniel, Ed; Savitsky, Marc; Lin, Bill; Diprose, Jonathan M.; Wilter da Silva, Alan; Pilicheva, Katya; Troshin, Peter; van Niekerk, Johannes; Isaacs, Neil; Naismith, James; Nave, Colin; Blake, Richard; Wilson, Keith S.; Stuart, David I.; Henrick, Kim; Esnouf, Robert M.
2011-01-01
The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service. PMID:21460443
Morris, Chris; Pajon, Anne; Griffiths, Susanne L; Daniel, Ed; Savitsky, Marc; Lin, Bill; Diprose, Jonathan M; da Silva, Alan Wilter; Pilicheva, Katya; Troshin, Peter; van Niekerk, Johannes; Isaacs, Neil; Naismith, James; Nave, Colin; Blake, Richard; Wilson, Keith S; Stuart, David I; Henrick, Kim; Esnouf, Robert M
2011-04-01
The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service.
Dynamic reusable workflows for ocean science
Signell, Richard; Fernandez, Filipe; Wilcox, Kyle
2016-01-01
Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.
Contreras, Iván; Kiefer, Stephan; Vehi, Josep
2017-01-01
Diabetes self-management is a crucial element for all people with diabetes and those at risk for developing the disease. Diabetic patients should be empowered to increase their self-management skills in order to prevent or delay the complications of diabetes. This work presents the proposal and first development stages of a smartphone application focused on the empowerment of the patients with diabetes. The concept of this interventional tool is based on the personalization of the user experience from an adaptive and dynamic perspective. The segmentation of the population and the dynamical treatment of user profiles among the different experience levels is the main challenge of the implementation. The self-management assistant and remote treatment for diabetes aims to develop a platform to integrate a series of innovative models and tools rigorously tested and supported by the research literature in diabetes together the use of a proved engine to manage workflows for healthcare.
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.
Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A
2005-04-07
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Support for Taverna workflows in the VPH-Share cloud platform.
Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F
2017-07-01
To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
iRODS: A Distributed Data Management Cyberinfrastructure for Observatories
NASA Astrophysics Data System (ADS)
Rajasekar, A.; Moore, R.; Vernon, F.
2007-12-01
Large-scale and long-term preservation of both observational and synthesized data requires a system that virtualizes data management concepts. A methodology is needed that can work across long distances in space (distribution) and long-periods in time (preservation). The system needs to manage data stored on multiple types of storage systems including new systems that become available in the future. This concept is called infrastructure independence, and is typically implemented through virtualization mechanisms. Data grids are built upon concepts of data and trust virtualization. These concepts enable the management of collections of data that are distributed across multiple institutions, stored on multiple types of storage systems, and accessed by multiple types of clients. Data virtualization ensures that the name spaces used to identify files, users, and storage systems are persistent, even when files are migrated onto future technology. This is required to preserve authenticity, the link between the record and descriptive and provenance metadata. Trust virtualization ensures that access controls remain invariant as files are moved within the data grid. This is required to track the chain of custody of records over time. The Storage Resource Broker (http://www.sdsc.edu/srb) is one such data grid used in a wide variety of applications in earth and space sciences such as ROADNet (roadnet.ucsd.edu), SEEK (seek.ecoinformatics.org), GEON (www.geongrid.org) and NOAO (www.noao.edu). Recent extensions to data grids provide one more level of virtualization - policy or management virtualization. Management virtualization ensures that execution of management policies can be automated, and that rules can be created that verify assertions about the shared collections of data. When dealing with distributed large-scale data over long periods of time, the policies used to manage the data and provide assurances about the authenticity of the data become paramount. The integrated Rule-Oriented Data System (iRODS) (http://irods.sdsc.edu) provides the mechanisms needed to describe not only management policies, but also to track how the policies are applied and their execution results. The iRODS data grid maps management policies to rules that control the execution of the remote micro-services. As an example, a rule can be created that automatically creates a replica whenever a file is added to a specific collection, or extracts its metadata automatically and registers it in a searchable catalog. For the replication operation, the persistent state information consists of the replica location, the creation date, the owner, the replica size, etc. The mechanism used by iRODS for providing policy virtualization is based on well-defined functions, called micro-services, which are chained into alternative workflows using rules. A rule engine, based on the event-condition-action paradigm executes the rule-based workflows after an event. Rules can be deferred to a pre-determined time or executed on a periodic basis. As the data management policies evolve, the iRODS system can implement new rules, new micro-services, and new state information (metadata content) needed to manage the new policies. Each sub- collection can be managed using a different set of policies. The discussion of the concepts in rule-based policy virtualization and its application to long-term and large-scale data management for observatories such as ORION and NEON will be the basis of the paper.
PIMS-Universal Payload Information Management
NASA Technical Reports Server (NTRS)
Elmore, Ralph; McNair, Ann R. (Technical Monitor)
2002-01-01
As the overall manager and integrator of International Space Station (ISS) science payloads and experiments, the Payload Operations Integration Center (POIC) at Marshall Space Flight Center had a critical need to provide an information management system for exchange and management of ISS payload files as well as to coordinate ISS payload related operational changes. The POIC's information management system has a fundamental requirement to provide secure operational access not only to users physically located at the POIC, but also to provide collaborative access to remote experimenters and International Partners. The Payload Information Management System (PIMS) is a ground based electronic document configuration management and workflow system that was built to service that need. Functionally, PIMS provides the following document management related capabilities: 1. File access control, storage and retrieval from a central repository vault. 2. Collect supplemental data about files in the vault. 3. File exchange with a PMS GUI client, or any FTP connection. 4. Files placement into an FTP accessible dropbox for pickup by interfacing facilities, included files transmitted for spacecraft uplink. 5. Transmission of email messages to users notifying them of new version availability. 6. Polling of intermediate facility dropboxes for files that will automatically be processed by PIMS. 7. Provide an API that allows other POIC applications to access PIMS information. Functionally, PIMS provides the following Change Request processing capabilities: 1. Ability to create, view, manipulate, and query information about Operations Change Requests (OCRs). 2. Provides an adaptable workflow approval of OCRs with routing through developers, facility leads, POIC leads, reviewers, and implementers. Email messages can be sent to users either involving them in the workflow process or simply notifying them of OCR approval progress. All PIMS document management and OCR workflow controls are coordinated through and routed to individual user's "to do" list tasks. A user is given a task when it is their turn to perform some action relating to the approval of the Document or OCR. The user's available actions are restricted to only functions available for the assigned task. Certain actions, such as review or action implementation by non-PIMS users, can also be coordinated through automated emails.
Processing, Cataloguing and Distribution of Uas Images in Near Real Time
NASA Astrophysics Data System (ADS)
Runkel, I.
2013-08-01
Why are UAS such a hype? UAS make the data capture flexible, fast and easy. For many applications this is more important than a perfect photogrammetric aerial image block. To ensure, that the advantage of a fast data capturing will be valid up to the end of the processing chain, all intermediate steps like data processing and data dissemination to the customer need to be flexible and fast as well. GEOSYSTEMS has established the whole processing workflow as server/client solution. This is the focus of the presentation. Depending on the image acquisition system the image data can be down linked during the flight to the data processing computer or it is stored on a mobile device and hooked up to the data processing computer after the flight campaign. The image project manager reads the data from the device and georeferences the images according to the position data. The meta data is converted into an ISO conform format and subsequently all georeferenced images are catalogued in the raster data management System ERDAS APOLLO. APOLLO provides the data, respectively the images as an OGC-conform services to the customer. Within seconds the UAV-images are ready to use for GIS application, image processing or direct interpretation via web applications - where ever you want. The whole processing chain is built in a generic manner. It can be adapted to a magnitude of applications. The UAV imageries can be processed and catalogued as single ortho imges or as image mosaic. Furthermore, image data of various cameras can be fusioned. By using WPS (web processing services) image enhancement, image analysis workflows like change detection layers can be calculated and provided to the image analysts. The processing of the WPS runs direct on the raster data management server. The image analyst has no data and no software on his local computer. This workflow is proven to be fast, stable and accurate. It is designed to support time critical applications for security demands - the images can be checked and interpreted in near real-time. For sensible areas it gives you the possibility to inform remote decision makers or interpretation experts in order to provide them situations awareness, wherever they are. For monitoring and inspection tasks it speeds up the process of data capture and data interpretation. The fully automated workflow of data pre-processing, data georeferencing, data cataloguing and data dissemination in near real time was developed based on the Intergraph products ERDAS IMAGINE, ERDAS APOLLO and GEOSYSTEMS METAmorph!IT. It is offered as adaptable solution by GEOSYSTEMS GmbH.
[Application of information management system about medical equipment].
Hang, Jianjin; Zhang, Chaoqun; Wu, Xiang-Yang
2011-05-01
Based on the practice of workflow, information management system about medical equipment was developed and its functions such as gathering, browsing, inquiring and counting were introduced. With dynamic and complete case management of medical equipment, the system improved the management of medical equipment.
NASA Astrophysics Data System (ADS)
Samadzadegan, F.; Saber, M.; Zahmatkesh, H.; Joze Ghazi Khanlou, H.
2013-09-01
Rapidly discovering, sharing, integrating and applying geospatial information are key issues in the domain of emergency response and disaster management. Due to the distributed nature of data and processing resources in disaster management, utilizing a Service Oriented Architecture (SOA) to take advantages of workflow of services provides an efficient, flexible and reliable implementations to encounter different hazardous situation. The implementation specification of the Web Processing Service (WPS) has guided geospatial data processing in a Service Oriented Architecture (SOA) platform to become a widely accepted solution for processing remotely sensed data on the web. This paper presents an architecture design based on OGC web services for automated workflow for acquisition, processing remotely sensed data, detecting fire and sending notifications to the authorities. A basic architecture and its building blocks for an automated fire detection early warning system are represented using web-based processing of remote sensing imageries utilizing MODIS data. A composition of WPS processes is proposed as a WPS service to extract fire events from MODIS data. Subsequently, the paper highlights the role of WPS as a middleware interface in the domain of geospatial web service technology that can be used to invoke a large variety of geoprocessing operations and chaining of other web services as an engine of composition. The applicability of proposed architecture by a real world fire event detection and notification use case is evaluated. A GeoPortal client with open-source software was developed to manage data, metadata, processes, and authorities. Investigating feasibility and benefits of proposed framework shows that this framework can be used for wide area of geospatial applications specially disaster management and environmental monitoring.
Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab
NASA Astrophysics Data System (ADS)
Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.
2017-10-01
The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.
Providing traceability for neuroimaging analyses.
McClatchey, Richard; Branson, Andrew; Anjum, Ashiq; Bloodsworth, Peter; Habib, Irfan; Munir, Kamran; Shamdasani, Jetendr; Soomro, Kamran
2013-09-01
With the increasingly digital nature of biomedical data and as the complexity of analyses in medical research increases, the need for accurate information capture, traceability and accessibility has become crucial to medical researchers in the pursuance of their research goals. Grid- or Cloud-based technologies, often based on so-called Service Oriented Architectures (SOA), are increasingly being seen as viable solutions for managing distributed data and algorithms in the bio-medical domain. For neuroscientific analyses, especially those centred on complex image analysis, traceability of processes and datasets is essential but up to now this has not been captured in a manner that facilitates collaborative study. Few examples exist, of deployed medical systems based on Grids that provide the traceability of research data needed to facilitate complex analyses and none have been evaluated in practice. Over the past decade, we have been working with mammographers, paediatricians and neuroscientists in three generations of projects to provide the data management and provenance services now required for 21st century medical research. This paper outlines the finding of a requirements study and a resulting system architecture for the production of services to support neuroscientific studies of biomarkers for Alzheimer's disease. The paper proposes a software infrastructure and services that provide the foundation for such support. It introduces the use of the CRISTAL software to provide provenance management as one of a number of services delivered on a SOA, deployed to manage neuroimaging projects that have been studying biomarkers for Alzheimer's disease. In the neuGRID and N4U projects a Provenance Service has been delivered that captures and reconstructs the workflow information needed to facilitate researchers in conducting neuroimaging analyses. The software enables neuroscientists to track the evolution of workflows and datasets. It also tracks the outcomes of various analyses and provides provenance traceability throughout the lifecycle of their studies. As the Provenance Service has been designed to be generic it can be applied across the medical domain as a reusable tool for supporting medical researchers thus providing communities of researchers for the first time with the necessary tools to conduct widely distributed collaborative programmes of medical analysis. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Evans, J. D.; Hao, W.; Chettri, S.
2013-12-01
The cloud is proving to be a uniquely promising platform for scientific computing. Our experience with processing satellite data using Amazon Web Services highlights several opportunities for enhanced performance, flexibility, and cost effectiveness in the cloud relative to traditional computing -- for example: - Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by quiet periods when the satellite is out of receiving range. In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are automating a data-driven approach to the management of cloud computing resources, in which new data availability triggers the creation of new virtual machines (of variable size and processing power) which last only until the processing workflow is complete. - 'Spot instances' are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- for software systems that are engineered to withstand unpredictable interruptions in service (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing workflows to resume with minimal delays after temporary spot price spikes. This will allow systems to take full advantage of variably-priced 'utility computing.' - Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by 'user data' containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing. - Finally, cloud computing is a promising platform for distributed volunteer ('interstitial') computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. Interstitial computing faces significant challenges as commodity computing shifts from (always on) desktop computers towards smartphones and tablets (untethered and running on scarce battery power); but cloud computing offers significant slack capacity. This capacity includes virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources (i.e., cloud-based interstitial computing) for satellite data processing and related analyses. We will present our findings and research directions on these and related topics.
ImTK: an open source multi-center information management toolkit
NASA Astrophysics Data System (ADS)
Alaoui, Adil; Ingeholm, Mary Lou; Padh, Shilpa; Dorobantu, Mihai; Desai, Mihir; Cleary, Kevin; Mun, Seong K.
2008-03-01
The Information Management Toolkit (ImTK) Consortium is an open source initiative to develop robust, freely available tools related to the information management needs of basic, clinical, and translational research. An open source framework and agile programming methodology can enable distributed software development while an open architecture will encourage interoperability across different environments. The ISIS Center has conceptualized a prototype data sharing network that simulates a multi-center environment based on a federated data access model. This model includes the development of software tools to enable efficient exchange, sharing, management, and analysis of multimedia medical information such as clinical information, images, and bioinformatics data from multiple data sources. The envisioned ImTK data environment will include an open architecture and data model implementation that complies with existing standards such as Digital Imaging and Communications (DICOM), Health Level 7 (HL7), and the technical framework and workflow defined by the Integrating the Healthcare Enterprise (IHE) Information Technology Infrastructure initiative, mainly the Cross Enterprise Document Sharing (XDS) specifications.
A UIMA wrapper for the NCBO annotator
Roeder, Christophe; Jonquet, Clement; Shah, Nigam H.; Baumgartner, William A.; Verspoor, Karin; Hunter, Lawrence
2010-01-01
Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows. Availability: This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows. Contact: chris.roeder@ucdenver.edu PMID:20505005
Scientific Workflows + Provenance = Better (Meta-)Data Management
NASA Astrophysics Data System (ADS)
Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.
2013-12-01
The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-01-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Development of the workflow kine systems for support on KAIZEN.
Mizuno, Yuki; Ito, Toshihiko; Yoshikawa, Toru; Yomogida, Satoshi; Morio, Koji; Sakai, Kazuhiro
2012-01-01
In this paper, we introduce the new workflow line system consisted of the location and image recording, which led to the acquisition of workflow information and the analysis display. From the results of workflow line investigation, we considered the anticipated effects and the problems on KAIZEN. Workflow line information included the location information and action contents information. These technologies suggest the viewpoints to help improvement, for example, exclusion of useless movement, the redesign of layout and the review of work procedure. Manufacturing factory, it was clear that there was much movement from the standard operation place and accumulation residence time. The following was shown as a result of this investigation, to be concrete, the efficient layout was suggested by this system. In the case of the hospital, similarly, it is pointed out that the workflow has the problem of layout and setup operations based on the effective movement pattern of the experts. This system could adapt to routine work, including as well as non-routine work. By the development of this system which can fit and adapt to industrial diversification, more effective "visual management" (visualization of work) is expected in the future.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-06-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
Mominah, Maher; Yunus, Faisel; Househ, Mowafa S
2013-01-01
Computerized provider order entry (CPOE) is a health informatics system that helps health care providers create and manage orders for medications and other health care services. Through the automation of the ordering process, CPOE has improved the overall efficiency of hospital processes and workflow. In Saudi Arabia, CPOE has been used for years, with only a few studies evaluating the impacts of CPOE on clinical workflow. In this paper, we discuss the experience of a local hospital with the use of CPOE and its impacts on clinical workflow. Results show that there are many issues related to the implementation and use of CPOE within Saudi Arabia that must be addressed, including design, training, medication errors, alert fatigue, and system dep Recommendations for improving CPOE use within Saudi Arabia are also discussed.
A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.
Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary
2017-12-01
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data
NASA Astrophysics Data System (ADS)
Peterson, F. S.; Barbieri, L.; Wyngaard, J.
2015-12-01
Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics
NASA Astrophysics Data System (ADS)
Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.
2015-12-01
Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in NetCDF format has been implemented and will be demonstrated at AGU.
A Scalable, Open Source Platform for Data Processing, Archiving and Dissemination
2016-01-01
Object Oriented Data Technology (OODT) big data toolkit developed by NASA and the Work-flow INstance Generation and Selection (WINGS) scientific work...to several challenge big data problems and demonstrated the utility of OODT-WINGS in addressing them. Specific demonstrated analyses address i...source software, Apache, Object Oriented Data Technology, OODT, semantic work-flows, WINGS, big data , work- flow management 16. SECURITY CLASSIFICATION OF
NASA Astrophysics Data System (ADS)
Hermans, Thomas; Nguyen, Frédéric; Caers, Jef
2015-07-01
In inverse problems, investigating uncertainty in the posterior distribution of model parameters is as important as matching data. In recent years, most efforts have focused on techniques to sample the posterior distribution with reasonable computational costs. Within a Bayesian context, this posterior depends on the prior distribution. However, most of the studies ignore modeling the prior with realistic geological uncertainty. In this paper, we propose a workflow inspired by a Popper-Bayes philosophy that data should first be used to falsify models, then only be considered for matching. We propose a workflow consisting of three steps: (1) in defining the prior, we interpret multiple alternative geological scenarios from literature (architecture of facies) and site-specific data (proportions of facies). Prior spatial uncertainty is modeled using multiple-point geostatistics, where each scenario is defined using a training image. (2) We validate these prior geological scenarios by simulating electrical resistivity tomography (ERT) data on realizations of each scenario and comparing them to field ERT in a lower dimensional space. In this second step, the idea is to probabilistically falsify scenarios with ERT, meaning that scenarios which are incompatible receive an updated probability of zero while compatible scenarios receive a nonzero updated belief. (3) We constrain the hydrogeological model with hydraulic head and ERT using a stochastic search method. The workflow is applied to a synthetic and a field case studies in an alluvial aquifer. This study highlights the importance of considering and estimating prior uncertainty (without data) through a process of probabilistic falsification.
Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.; ...
2018-02-06
Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.
Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
Bringing the CMS distributed computing system into scalable operations
NASA Astrophysics Data System (ADS)
Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.
2010-04-01
Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
The Ocean Observatories Initiative Data Management and QA/QC: Lessons Learned and the Path Ahead
NASA Astrophysics Data System (ADS)
Vardaro, M.; Belabbassi, L.; Garzio, L. M.; Knuth, F.; Smith, M. J.; Kerfoot, J.; Crowley, M. F.
2016-02-01
The Ocean Observatories Initiative (OOI) is a multi-decadal, NSF-funded program that will provide long-term, near real-time cabled and telemetered measurements of climate variability, ocean circulation, ecosystem dynamics, air-sea exchange, seafloor processes, and plate-scale geodynamics. The OOI platforms consist of seafloor sensors, fixed moorings, and mobile assets containing over 700 operational instruments in the Atlantic and Pacific oceans. Rutgers University operates the Cyberinfrastructure (CI) component of the OOI, which acquires, processes and distributes data to scientists, researchers, educators and the public. It will also provide observatory mission command and control, data assessment and distribution, and long-term data management. The Rutgers Data Management Team consists of a data manager and four data evaluators, who are tasked with ensuring data completeness and quality, as well as interaction with OOI users to facilitate data delivery and utility. Here we will discuss the procedures developed to guide the data team workflow, the automated QC algorithms and human-in-the-loop (HITL) annotations that are used to flag suspect data (whether due to instrument failures, biofouling, or unanticipated events), system alerts and alarms, long-term data storage and CF (Climate and Forecast) standard compliance, and the lessons learned during construction and the first several months of OOI operations.
Software-Enabled Distributed Network Governance: The PopMedNet Experience.
Davies, Melanie; Erickson, Kyle; Wyner, Zachary; Malenfant, Jessica; Rosen, Rob; Brown, Jeffrey
2016-01-01
The expanded availability of electronic health information has led to increased interest in distributed health data research networks. The distributed research network model leaves data with and under the control of the data holder. Data holders, network coordinating centers, and researchers have distinct needs and challenges within this model. The concerns of network stakeholders are addressed in the design and governance models of the PopMedNet software platform. PopMedNet features include distributed querying, customizable workflows, and auditing and search capabilities. Its flexible role-based access control system enables the enforcement of varying governance policies. Four case studies describe how PopMedNet is used to enforce network governance models. Trust is an essential component of a distributed research network and must be built before data partners may be willing to participate further. The complexity of the PopMedNet system must be managed as networks grow and new data, analytic methods, and querying approaches are developed. The PopMedNet software platform supports a variety of network structures, governance models, and research activities through customizable features designed to meet the needs of network stakeholders.
NASA Astrophysics Data System (ADS)
Mattson, E.; Versteeg, R.; Ankeny, M.; Stormberg, G.
2005-12-01
Long term performance monitoring has been identified by DOE, DOD and EPA as one of the most challenging and costly elements of contaminated site remedial efforts. Such monitoring should provide timely and actionable information relevant to a multitude of stakeholder needs. This information should be obtained in a manner which is auditable, cost effective and transparent. Over the last several years INL staff has designed and implemented a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition from diverse sensors (geophysical, geochemical and hydrological) with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic javascript and html/css) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This system has been implemented and is operational for several sites, including the Ruby Gulch Waste Rock Repository (a capped mine waste rock dump on the Gilt Edge Mine Superfund Site), the INL Vadoze Zone Research Park and an alternative cover landfill. Implementations for other vadoze zone sites are currently in progress. These systems allow for autonomous performance monitoring through automated data analysis and report generation. This performance monitoring has allowed users to obtain insights into system dynamics, regulatory compliance and residence times of water. Our system uses modular components for data selection and graphing and WSDL compliant webservices for external functions such as statistical analyses and model invocations. Thus, implementing this system for novel sites and extending functionality (e.g. adding novel models) is relatively straightforward. As system access requires a standard webbrowser and uses intuitive functionality, stakeholders with diverse degrees of technical insight can use this system with little or no training.
Multi-core processing and scheduling performance in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hernandez, J. M.; Evans, D.; Foulkes, S.
2012-01-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resultingmore » in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.« less
FRIEDA: Flexible Robust Intelligent Elastic Data Management Framework
Ghoshal, Devarshi; Hendrix, Valerie; Fox, William; ...
2017-02-01
Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance, cost trade-offs, complex application choices and complexity associated with elasticity, failure rates in these environments. The different data access patterns for data-intensive scientific applications require a more flexible and robust data management solution than the ones currently in existence. FRIEDA is a Flexible Robust Intelligent Elastic Data Management framework that employs a range of data management strategies in cloud environments. FRIEDA can manage storage and data lifecyclemore » of applications in cloud environments. There are four different stages in the data management lifecycle of FRIEDA – (i) storage planning, (ii) provisioning and preparation, (iii) data placement, and (iv) execution. FRIEDA defines a data control plane and an execution plane. The data control plane defines the data partition and distribution strategy, whereas the execution plane manages the execution of the application using a master-worker paradigm. FRIEDA also provides different data management strategies, either to partition the data in real-time, or predetermine the data partitions prior to application execution.« less
FRIEDA: Flexible Robust Intelligent Elastic Data Management Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghoshal, Devarshi; Hendrix, Valerie; Fox, William
Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance, cost trade-offs, complex application choices and complexity associated with elasticity, failure rates in these environments. The different data access patterns for data-intensive scientific applications require a more flexible and robust data management solution than the ones currently in existence. FRIEDA is a Flexible Robust Intelligent Elastic Data Management framework that employs a range of data management strategies in cloud environments. FRIEDA can manage storage and data lifecyclemore » of applications in cloud environments. There are four different stages in the data management lifecycle of FRIEDA – (i) storage planning, (ii) provisioning and preparation, (iii) data placement, and (iv) execution. FRIEDA defines a data control plane and an execution plane. The data control plane defines the data partition and distribution strategy, whereas the execution plane manages the execution of the application using a master-worker paradigm. FRIEDA also provides different data management strategies, either to partition the data in real-time, or predetermine the data partitions prior to application execution.« less
Akuna: An Open Source User Environment for Managing Subsurface Simulation Workflows
NASA Astrophysics Data System (ADS)
Freedman, V. L.; Agarwal, D.; Bensema, K.; Finsterle, S.; Gable, C. W.; Keating, E. H.; Krishnan, H.; Lansing, C.; Moeglein, W.; Pau, G. S. H.; Porter, E.; Scheibe, T. D.
2014-12-01
The U.S. Department of Energy (DOE) is investing in development of a numerical modeling toolset called ASCEM (Advanced Simulation Capability for Environmental Management) to support modeling analyses at legacy waste sites. ASCEM is an open source and modular computing framework that incorporates new advances and tools for predicting contaminant fate and transport in natural and engineered systems. The ASCEM toolset includes both a Platform with Integrated Toolsets (called Akuna) and a High-Performance Computing multi-process simulator (called Amanzi). The focus of this presentation is on Akuna, an open-source user environment that manages subsurface simulation workflows and associated data and metadata. In this presentation, key elements of Akuna are demonstrated, which includes toolsets for model setup, database management, sensitivity analysis, parameter estimation, uncertainty quantification, and visualization of both model setup and simulation results. A key component of the workflow is in the automated job launching and monitoring capabilities, which allow a user to submit and monitor simulation runs on high-performance, parallel computers. Visualization of large outputs can also be performed without moving data back to local resources. These capabilities make high-performance computing accessible to the users who might not be familiar with batch queue systems and usage protocols on different supercomputers and clusters.
NASA Astrophysics Data System (ADS)
Sheldon, W.; Chamblee, J.; Cary, R. H.
2013-12-01
Environmental scientists are under increasing pressure from funding agencies and journal publishers to release quality-controlled data in a timely manner, as well as to produce comprehensive metadata for submitting data to long-term archives (e.g. DataONE, Dryad and BCO-DMO). At the same time, the volume of digital data that researchers collect and manage is increasing rapidly due to advances in high frequency electronic data collection from flux towers, instrumented moorings and sensor networks. However, few pre-built software tools are available to meet these data management needs, and those tools that do exist typically focus on part of the data management lifecycle or one class of data. The GCE Data Toolbox has proven to be both a generalized and effective software solution for environmental data management in the Long Term Ecological Research Network (LTER). This open source MATLAB software library, developed by the Georgia Coastal Ecosystems LTER program, integrates metadata capture, creation and management with data processing, quality control and analysis to support the entire data lifecycle. Raw data can be imported directly from common data logger formats (e.g. SeaBird, Campbell Scientific, YSI, Hobo), as well as delimited text files, MATLAB files and relational database queries. Basic metadata are derived from the data source itself (e.g. parsed from file headers) and by value inspection, and then augmented using editable metadata templates containing boilerplate documentation, attribute descriptors, code definitions and quality control rules. Data and metadata content, quality control rules and qualifier flags are then managed together in a robust data structure that supports database functionality and ensures data validity throughout processing. A growing suite of metadata-aware editing, quality control, analysis and synthesis tools are provided with the software to support managing data using graphical forms and command-line functions, as well as developing automated workflows for unattended processing. Finalized data and structured metadata can be exported in a wide variety of text and MATLAB formats or uploaded to a relational database for long-term archiving and distribution. The GCE Data Toolbox can be used as a complete, light-weight solution for environmental data and metadata management, but it can also be used in conjunction with other cyber infrastructure to provide a more comprehensive solution. For example, newly acquired data can be retrieved from a Data Turbine or Campbell LoggerNet Database server for quality control and processing, then transformed to CUAHSI Observations Data Model format and uploaded to a HydroServer for distribution through the CUAHSI Hydrologic Information System. The GCE Data Toolbox can also be leveraged in analytical workflows developed using Kepler or other systems that support MATLAB integration or tool chaining. This software can therefore be leveraged in many ways to help researchers manage, analyze and distribute the data they collect.
Barbarito, Fulvio; Pinciroli, Francesco; Mason, John; Marceglia, Sara; Mazzola, Luca; Bonacina, Stefano
2012-08-01
Information technologies (ITs) have now entered the everyday workflow in a variety of healthcare providers with a certain degree of independence. This independence may be the cause of difficulty in interoperability between information systems and it can be overcome through the implementation and adoption of standards. Here we present the case of the Lombardy Region, in Italy, that has been able, in the last 10 years, to set up the Regional Social and Healthcare Information System, connecting all the healthcare providers within the region, and providing full access to clinical and health-related documents independently from the healthcare organization that generated the document itself. This goal, in a region with almost 10 millions citizens, was achieved through a twofold approach: first, the political and operative push towards the adoption of the Health Level 7 (HL7) standard within single hospitals and, second, providing a technological infrastructure for data sharing based on interoperability specifications recognized at the regional level for messages transmitted from healthcare providers to the central domain. The adoption of such regional interoperability specifications enabled the communication among heterogeneous systems placed in different hospitals in Lombardy. Integrating the Healthcare Enterprise (IHE) integration profiles which refer to HL7 standards are adopted within hospitals for message exchange and for the definition of integration scenarios. The IHE patient administration management (PAM) profile with its different workflows is adopted for patient management, whereas the Scheduled Workflow (SWF), the Laboratory Testing Workflow (LTW), and the Ambulatory Testing Workflow (ATW) are adopted for order management. At present, the system manages 4,700,000 pharmacological e-prescriptions, and 1,700,000 e-prescriptions for laboratory exams per month. It produces, monthly, 490,000 laboratory medical reports, 180,000 radiology medical reports, 180,000 first aid medical reports, and 58,000 discharge summaries. Hence, despite there being still work in progress, the Lombardy Region healthcare system is a fully interoperable social healthcare system connecting patients, healthcare providers, healthcare organizations, and healthcare professionals in a large and heterogeneous territory through the implementation of international health standards. Copyright © 2012 Elsevier Inc. All rights reserved.
Parametric Workflow (BIM) for the Repair Construction of Traditional Historic Architecture in Taiwan
NASA Astrophysics Data System (ADS)
Ma, Y.-P.; Hsu, C. C.; Lin, M.-C.; Tsai, Z.-W.; Chen, J.-Y.
2015-08-01
In Taiwan, numerous existing traditional buildings are constructed with wooden structures, brick structures, and stone structures. This paper will focus on the Taiwan traditional historic architecture and target the traditional wooden structure buildings as the design proposition and process the BIM workflow for modeling complex wooden combination geometry, integrating with more traditional 2D documents and for visualizing repair construction assumptions within the 3D model representation. The goal of this article is to explore the current problems to overcome in wooden historic building conservation, and introduce the BIM technology in the case of conserving, documenting, managing, and creating full engineering drawings and information for effectively support historic conservation. Although BIM is mostly oriented to current construction praxis, there have been some attempts to investigate its applicability in historic conservation projects. This article also illustrates the importance and advantages of using BIM workflow in repair construction process, when comparing with generic workflow.
Task Delegation Based Access Control Models for Workflow Systems
NASA Astrophysics Data System (ADS)
Gaaloul, Khaled; Charoy, François
e-Government organisations are facilitated and conducted using workflow management systems. Role-based access control (RBAC) is recognised as an efficient access control model for large organisations. The application of RBAC in workflow systems cannot, however, grant permissions to users dynamically while business processes are being executed. We currently observe a move away from predefined strict workflow modelling towards approaches supporting flexibility on the organisational level. One specific approach is that of task delegation. Task delegation is a mechanism that supports organisational flexibility, and ensures delegation of authority in access control systems. In this paper, we propose a Task-oriented Access Control (TAC) model based on RBAC to address these requirements. We aim to reason about task from organisational perspectives and resources perspectives to analyse and specify authorisation constraints. Moreover, we present a fine grained access control protocol to support delegation based on the TAC model.
Yeung, Daniel; Boes, Peter; Ho, Meng Wei; Li, Zuofeng
2015-05-08
Image-guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X-rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post-treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state-of-the-art Web technologies, a domain model closely matching the workflow, a database-supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model-View-Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client-side technologies, such as jQuery, jQuery Plug-ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process.
Cyberinfrastructure for End-to-End Environmental Explorations
NASA Astrophysics Data System (ADS)
Merwade, V.; Kumar, S.; Song, C.; Zhao, L.; Govindaraju, R.; Niyogi, D.
2007-12-01
The design and implementation of a cyberinfrastructure for End-to-End Environmental Exploration (C4E4) is presented. The C4E4 framework addresses the need for an integrated data/computation platform for studying broad environmental impacts by combining heterogeneous data resources with state-of-the-art modeling and visualization tools. With Purdue being a TeraGrid Resource Provider, C4E4 builds on top of the Purdue TeraGrid data management system and Grid resources, and integrates them through a service-oriented workflow system. It allows researchers to construct environmental workflows for data discovery, access, transformation, modeling, and visualization. Using the C4E4 framework, we have implemented an end-to-end SWAT simulation and analysis workflow that connects our TeraGrid data and computation resources. It enables researchers to conduct comprehensive studies on the impact of land management practices in the St. Joseph watershed using data from various sources in hydrologic, atmospheric, agricultural, and other related disciplines.
Kilian, Norbert; Henning, Tilo; Plitzner, Patrick; Müller, Andreas; Güntsch, Anton; Stöver, Ben C.; Müller, Kai F.; Berendsohn, Walter G.; Borsch, Thomas
2015-01-01
We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work. Database URL: http://campanula.e-taxonomy.net/ PMID:26424081
Dawn: A Simulation Model for Evaluating Costs and Tradeoffs of Big Data Science Architectures
NASA Astrophysics Data System (ADS)
Cinquini, L.; Crichton, D. J.; Braverman, A. J.; Kyo, L.; Fuchs, T.; Turmon, M.
2014-12-01
In many scientific disciplines, scientists and data managers are bracing for an upcoming deluge of big data volumes, which will increase the size of current data archives by a factor of 10-100 times. For example, the next Climate Model Inter-comparison Project (CMIP6) will generate a global archive of model output of approximately 10-20 Peta-bytes, while the upcoming next generation of NASA decadal Earth Observing instruments are expected to collect tens of Giga-bytes/day. In radio-astronomy, the Square Kilometre Array (SKA) will collect data in the Exa-bytes/day range, of which (after reduction and processing) around 1.5 Exa-bytes/year will be stored. The effective and timely processing of these enormous data streams will require the design of new data reduction and processing algorithms, new system architectures, and new techniques for evaluating computation uncertainty. Yet at present no general software tool or framework exists that will allow system architects to model their expected data processing workflow, and determine the network, computational and storage resources needed to prepare their data for scientific analysis. In order to fill this gap, at NASA/JPL we have been developing a preliminary model named DAWN (Distributed Analytics, Workflows and Numerics) for simulating arbitrary complex workflows composed of any number of data processing and movement tasks. The model can be configured with a representation of the problem at hand (the data volumes, the processing algorithms, the available computing and network resources), and is able to evaluate tradeoffs between different possible workflows based on several estimators: overall elapsed time, separate computation and transfer times, resulting uncertainty, and others. So far, we have been applying DAWN to analyze architectural solutions for 4 different use cases from distinct science disciplines: climate science, astronomy, hydrology and a generic cloud computing use case. This talk will present preliminary results and discuss how DAWN can be evolved into a powerful tool for designing system architectures for data intensive science.
Transformation of OODT CAS to Perform Larger Tasks
NASA Technical Reports Server (NTRS)
Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean
2008-01-01
A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.
CMS distributed data analysis with CRAB3
Mascheroni, M.; Balcas, J.; Belforte, S.; ...
2015-12-23
The CMS Remote Analysis Builder (CRAB) is a distributed workflow management tool which facilitates analysis tasks by isolating users from the technical details of the Grid infrastructure. Throughout LHC Run 1, CRAB has been successfully employed by an average of 350 distinct users each week executing about 200,000 jobs per day.CRAB has been significantly upgraded in order to face the new challenges posed by LHC Run 2. Components of the new system include 1) a lightweight client, 2) a central primary server which communicates with the clients through a REST interface, 3) secondary servers which manage user analysis tasks andmore » submit jobs to the CMS resource provisioning system, and 4) a central service to asynchronously move user data from temporary storage in the execution site to the desired storage location. Furthermore, the new system improves the robustness, scalability and sustainability of the service.Here we provide an overview of the new system, operation, and user support, report on its current status, and identify lessons learned from the commissioning phase and production roll-out.« less
REMORA: a pilot in the ocean of BioMoby web-services.
Carrere, Sébastien; Gouzy, Jérôme
2006-04-01
Emerging web-services technology allows interoperability between multiple distributed architectures. Here, we present REMORA, a web server implemented according to the BioMoby web-service specifications, providing life science researchers with an easy-to-use workflow generator and launcher, a repository of predefined workflows and a survey system. Jerome.Gouzy@toulouse.inra.fr The REMORA web server is freely available at http://bioinfo.genopole-toulouse.prd.fr/remora, sources are available upon request from the authors.
From the desktop to the grid: scalable bioinformatics via workflow conversion.
de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver
2016-03-12
Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2017-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments. PMID:28190948
Managing the life cycle of electronic clinical documents.
Payne, Thomas H; Graham, Gail
2006-01-01
To develop a model of the life cycle of clinical documents from inception to use in a person's medical record, including workflow requirements from clinical practice, local policy, and regulation. We propose a model for the life cycle of clinical documents as a framework for research on documentation within electronic medical record (EMR) systems. Our proposed model includes three axes: the stages of the document, the roles of those involved with the document, and the actions those involved may take on the document at each stage. The model includes the rules to describe who (in what role) can perform what actions on the document, and at what stages they can perform them. Rules are derived from needs of clinicians, and requirements of hospital bylaws and regulators. Our model encompasses current practices for paper medical records and workflow in some EMR systems. Commercial EMR systems include methods for implementing document workflow rules. Workflow rules that are part of this model mirror functionality in the Department of Veterans Affairs (VA) EMR system where the Authorization/ Subscription Utility permits document life cycle rules to be written in English-like fashion. Creating a model of the life cycle of clinical documents serves as a framework for discussion of document workflow, how rules governing workflow can be implemented in EMR systems, and future research of electronic documentation.
Development and Appraisal of Multiple Accounting Record System (Mars).
Yu, H C; Chen, M C
2016-01-01
The aim of the system is to achieve simplification of workflow, reduction of recording time, and increase the income for the study hospital. The project team decided to develop a multiple accounting record system that generates the account records based on the nursing records automatically, reduces the time and effort for nurses to review the procedure and provide another note of material consumption. Three configuration files were identified to demonstrate the relationship of treatments and reimbursement items. The workflow was simplified. The nurses averagely reduced 10 minutes of daily recording time, and the reimbursement points have been increased by 7.49%. The project streamlined the workflow and provides the institute a better way in finical management.
Ho, Jonhan; Aridor, Orly; Parwani, Anil V.
2012-01-01
Background: For decades anatomic pathology (AP) workflow have been a highly manual process based on the use of an optical microscope and glass slides. Recent innovations in scanning and digitizing of entire glass slides are accelerating a move toward widespread adoption and implementation of a workflow based on digital slides and their supporting information management software. To support the design of digital pathology systems and ensure their adoption into pathology practice, the needs of the main users within the AP workflow, the pathologists, should be identified. Contextual inquiry is a qualitative, user-centered, social method designed to identify and understand users’ needs and is utilized for collecting, interpreting, and aggregating in-detail aspects of work. Objective: Contextual inquiry was utilized to document current AP workflow, identify processes that may benefit from the introduction of digital pathology systems, and establish design requirements for digital pathology systems that will meet pathologists’ needs. Materials and Methods: Pathologists were observed and interviewed at a large academic medical center according to contextual inquiry guidelines established by Holtzblatt et al. 1998. Notes representing user-provided data were documented during observation sessions. An affinity diagram, a hierarchal organization of the notes based on common themes in the data, was created. Five graphical models were developed to help visualize the data including sequence, flow, artifact, physical, and cultural models. Results: A total of six pathologists were observed by a team of two researchers. A total of 254 affinity notes were documented and organized using a system based on topical hierarchy, including 75 third-level, 24 second-level, and five main-level categories, including technology, communication, synthesis/preparation, organization, and workflow. Current AP workflow was labor intensive and lacked scalability. A large number of processes that may possibly improve following the introduction of digital pathology systems were identified. These work processes included case management, case examination and review, and final case reporting. Furthermore, a digital slide system should integrate with the anatomic pathologic laboratory information system. Conclusions: To our knowledge, this is the first study that utilized the contextual inquiry method to document AP workflow. Findings were used to establish key requirements for the design of digital pathology systems. PMID:23243553
Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.
Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee
2018-01-01
Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
78 FR 20087 - Privacy Act of 1974; Proposed New System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-03
... is comprised of two components--Enterprise Content Management (ECM) and the Account Management System (AMS). The heart of the system is the ECM component, which manages the workflows that were developed..., digital media, and/or CD-ROM. PAS is a customized module within USDA's Enterprise Content Management (ECM...
Jeremy S. Fried; Larry D. Potts; Sara M. Loreno; Glenn A. Christensen; R. Jamie Barbour
2017-01-01
The Forest Inventory and Analysis (FIA)-based BioSum (Bioregional Inventory Originated Simulation Under Management) is a free policy analysis framework and workflow management software solution. It addresses complex management questions concerning forest health and vulnerability for large, multimillion acre, multiowner landscapes using FIA plot data as the initial...
SAHM:VisTrails (Software for Assisted Habitat Modeling for VisTrails): training course
Holcombe, Tracy
2014-01-01
VisTrails is an open-source management and scientific workflow system designed to integrate the best of both scientific workflow and scientific visualization systems. Developers can extend the functionality of the VisTrails system by creating custom modules for bundled VisTrails packages. The Invasive Species Science Branch of the U.S. Geological Survey (USGS) Fort Collins Science Center (FORT) and the U.S. Department of the Interior’s North Central Climate Science Center have teamed up to develop and implement such a module—the Software for Assisted Habitat Modeling (SAHM). SAHM expedites habitat modeling and helps maintain a record of the various input data, the steps before and after processing, and the modeling options incorporated in the construction of an ecological response model. There are four main advantages to using the SAHM:VisTrails combined package for species distribution modeling: (1) formalization and tractable recording of the entire modeling process; (2) easier collaboration through a common modeling framework; (3) a user-friendly graphical interface to manage file input, model runs, and output; and (4) extensibility to incorporate future and additional modeling routines and tools. In order to meet increased interest in the SAHM:VisTrails package, the FORT offers a training course twice a year. The course includes a combination of lecture, hands-on work, and discussion. Please join us and other ecological modelers to learn the capabilities of the SAHM:VisTrails package.
Streamlining geospatial metadata in the Semantic Web
NASA Astrophysics Data System (ADS)
Fugazza, Cristiano; Pepe, Monica; Oggioni, Alessandro; Tagliolato, Paolo; Carrara, Paola
2016-04-01
In the geospatial realm, data annotation and discovery rely on a number of ad-hoc formats and protocols. These have been created to enable domain-specific use cases generalized search is not feasible for. Metadata are at the heart of the discovery process and nevertheless they are often neglected or encoded in formats that either are not aimed at efficient retrieval of resources or are plainly outdated. Particularly, the quantum leap represented by the Linked Open Data (LOD) movement did not induce so far a consistent, interlinked baseline in the geospatial domain. In a nutshell, datasets, scientific literature related to them, and ultimately the researchers behind these products are only loosely connected; the corresponding metadata intelligible only to humans, duplicated on different systems, seldom consistently. Instead, our workflow for metadata management envisages i) editing via customizable web- based forms, ii) encoding of records in any XML application profile, iii) translation into RDF (involving the semantic lift of metadata records), and finally iv) storage of the metadata as RDF and back-translation into the original XML format with added semantics-aware features. Phase iii) hinges on relating resource metadata to RDF data structures that represent keywords from code lists and controlled vocabularies, toponyms, researchers, institutes, and virtually any description one can retrieve (or directly publish) in the LOD Cloud. In the context of a distributed Spatial Data Infrastructure (SDI) built on free and open-source software, we detail phases iii) and iv) of our workflow for the semantics-aware management of geospatial metadata.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.
Cole, Brian S; Moore, Jason H
2018-03-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Moore, Jason H.
2018-01-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
A DICOM Based Collaborative Platform for Real-Time Medical Teleconsultation on Medical Images.
Maglogiannis, Ilias; Andrikos, Christos; Rassias, Georgios; Tsanakas, Panayiotis
2017-01-01
The paper deals with the design of a Web-based platform for real-time medical teleconsultation on medical images. The proposed platform combines the principles of heterogeneous Workflow Management Systems (WfMSs), the peer-to-peer networking architecture and the SPA (Single-Page Application) concept, to facilitate medical collaboration among healthcare professionals geographically distributed. The presented work leverages state-of-the-art features of the web to support peer-to-peer communication using the WebRTC (Web Real Time Communication) protocol and client-side data processing for creating an integrated collaboration environment. The paper discusses the technical details of implementation and presents the operation of the platform in practice along with some initial results.
Apis - a Digital Inventory of Archaeological Heritage Based on Remote Sensing Data
NASA Astrophysics Data System (ADS)
Doneus, M.; Forwagner, U.; Liem, J.; Sevara, C.
2017-08-01
Heritage managers are in need of dynamic spatial inventories of archaeological and cultural heritage that provide them with multipurpose tools to interactively understand information about archaeological heritage within its landscape context. Specifically, linking site information with the respective non-invasive prospection data is of increasing importance as it allows for the assessment of inherent uncertainties related to the use and interpretation of remote sensing data by the educated and knowledgeable heritage manager. APIS, the archaeological prospection information system of the Aerial Archive of the University of Vienna, is specifically designed to meet these needs. It provides storage and easy access to all data concerning aerial photographs and archaeological sites through a single GIS-based application. Furthermore, APIS has been developed in an open source environment, which allows it to be freely distributed and modified. This combination in one single open source system facilitates an easy workflow for data management, interpretation, storage, and retrieval. APIS and a sample dataset will be released free of charge under creative commons license in near future.
NASA Astrophysics Data System (ADS)
Podger, G. M.; Cuddy, S. M.; Peeters, L.; Smith, T.; Bark, R. H.; Black, D. C.; Wallbrink, P.
2014-09-01
Water jurisdictions in Australia are required to prepare and implement water resource plans. In developing these plans the common goal is realising the best possible use of the water resources - maximising outcomes while minimising negative impacts. This requires managing the risks associated with assessing and balancing cultural, industrial, agricultural, social and environmental demands for water within a competitive and resource-limited environment. Recognising this, conformance to international risk management principles (ISO 31000:2009) have been embedded within the Murray-Darling Basin Plan. Yet, to date, there has been little strategic investment by water jurisdictions in bridging the gap between principle and practice. The ISO 31000 principles and the risk management framework that embodies them align well with an adaptive management paradigm within which to conduct water resource planning. They also provide an integrative framework for the development of workflows that link risk analysis with risk evaluation and mitigation (adaptation) scenarios, providing a transparent, repeatable and robust platform. This study, through a demonstration use case and a series of workflows, demonstrates to policy makers how these principles can be used to support the development of the next generation of water sharing plans in 2019. The workflows consider the uncertainty associated with climate and flow inputs, and model parameters on irrigation and hydropower production, meeting environmental flow objectives and recreational use of the water resource. The results provide insights to the risks associated with meeting a range of different objectives.
Using R in Taverna: RShell v1.2
Wassink, Ingo; Rauwerda, Han; Neerincx, Pieter BT; Vet, Paul E van der; Breit, Timo M; Leunissen, Jack AM; Nijholt, Anton
2009-01-01
Background R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical. Findings We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly. Conclusion Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way. PMID:19607662
de Carvalho, Elias Cesar Araujo; Batilana, Adelia Portero; Claudino, Wederson; Reis, Luiz Fernando Lima; Schmerling, Rafael A; Shah, Jatin; Pietrobon, Ricardo
2012-01-01
With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage professionals to reduce near miss events and save time/cost. Clinical trial sponsors should improve relevant support systems.
Araujo de Carvalho, Elias Cesar; Batilana, Adelia Portero; Claudino, Wederson; Lima Reis, Luiz Fernando; Schmerling, Rafael A.; Shah, Jatin; Pietrobon, Ricardo
2012-01-01
Background With the exponential expansion of clinical trials conducted in (Brazil, Russia, India, and China) and VISTA (Vietnam, Indonesia, South Africa, Turkey, and Argentina) countries, corresponding gains in cost and enrolment efficiency quickly outpace the consonant metrics in traditional countries in North America and European Union. However, questions still remain regarding the quality of data being collected in these countries. We used ethnographic, mapping and computer simulation studies to identify/address areas of threat to near miss events for data quality in two cancer trial sites in Brazil. Methodology/Principal Findings Two sites in Sao Paolo and Rio Janeiro were evaluated using ethnographic observations of workflow during subject enrolment and data collection. Emerging themes related to threats to near miss events for data quality were derived from observations. They were then transformed into workflows using UML-AD and modeled using System Dynamics. 139 tasks were observed and mapped through the ethnographic study. The UML-AD detected four major activities in the workflow evaluation of potential research subjects prior to signature of informed consent, visit to obtain subject́s informed consent, regular data collection sessions following study protocol and closure of study protocol for a given project. Field observations pointed to three major emerging themes: (a) lack of standardized process for data registration at source document, (b) multiplicity of data repositories and (c) scarcity of decision support systems at the point of research intervention. Simulation with policy model demonstrates a reduction of the rework problem. Conclusions/Significance Patterns of threats to data quality at the two sites were similar to the threats reported in the literature for American sites. The clinical trial site managers need to reorganize staff workflow by using information technology more efficiently, establish new standard procedures and manage professionals to reduce near miss events and save time/cost. Clinical trial sponsors should improve relevant support systems. PMID:22768105
Implementation of Epic Beaker Anatomic Pathology at an Academic Medical Center.
Blau, John Larry; Wilford, Joseph D; Dane, Susan K; Karandikar, Nitin J; Fuller, Emily S; Jacobsmeier, Debbie J; Jans, Melissa A; Horning, Elisabeth A; Krasowski, Matthew D; Ford, Bradley A; Becker, Kent R; Beranek, Jeanine M; Robinson, Robert A
2017-01-01
Beaker is a relatively new laboratory information system (LIS) offered by Epic Systems Corporation as part of its suite of health-care software and bundled with its electronic medical record, EpicCare. It is divided into two modules, Beaker anatomic pathology (Beaker AP) and Beaker Clinical Pathology. In this report, we describe our experience implementing Beaker AP version 2014 at an academic medical center with a go-live date of October 2015. This report covers preimplementation preparations and challenges beginning in September 2014, issues discovered soon after go-live in October 2015, and some post go-live optimizations using data from meetings, debriefings, and the project closure document. We share specific issues that we encountered during implementation, including difficulties with the proposed frozen section workflow, developing a shared specimen source dictionary, and implementation of the standard Beaker workflow in large institution with trainees. We share specific strategies that we used to overcome these issues for a successful Beaker AP implementation. Several areas of the laboratory-required adaptation of the default Beaker build parameters to meet the needs of the workflow in a busy academic medical center. In a few areas, our laboratory was unable to use the Beaker functionality to support our workflow, and we have continued to use paper or have altered our workflow. In spite of several difficulties that required creative solutions before go-live, the implementation has been successful based on satisfaction surveys completed by pathologists and others who use the software. However, optimization of Beaker workflows has continued to be an ongoing process after go-live to the present time. The Beaker AP LIS can be successfully implemented at an academic medical center but requires significant forethought, creative adaptation, and continued shared management of the ongoing product by institutional and departmental information technology staff as well as laboratory managers to meet the needs of the laboratory.
Singh, Dadabhai T; Trehan, Rahul; Schmidt, Bertil; Bretschneider, Timo
2008-01-01
Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed--a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area. In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution. The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.
Towards seamless workflows in agile data science
NASA Astrophysics Data System (ADS)
Klump, J. F.; Robertson, J.
2017-12-01
Agile workflows are a response to projects with requirements that may change over time. They prioritise rapid and flexible responses to change, preferring to adapt to changes in requirements rather than predict them before a project starts. This suits the needs of research very well because research is inherently agile in its methodology. The adoption of agile methods has made collaborative data analysis much easier in a research environment fragmented across institutional data stores, HPC, personal and lab computers and more recently cloud environments. Agile workflows use tools that share a common worldview: in an agile environment, there may be more that one valid version of data, code or environment in play at any given time. All of these versions need references and identifiers. For example, a team of developers following the git-flow conventions (github.com/nvie/gitflow) may have several active branches, one for each strand of development. These workflows allow rapid and parallel iteration while maintaining identifiers pointing to individual snapshots of data and code and allowing rapid switching between strands. In contrast, the current focus of versioning in research data management is geared towards managing data for reproducibility and long-term preservation of the record of science. While both are important goals in the persistent curation domain of the institutional research data infrastructure, current tools emphasise planning over adaptation and can introduce unwanted rigidity by insisting on a single valid version or point of truth. In the collaborative curation domain of a research project, things are more fluid. However, there is no equivalent to the "versioning iso-surface" of the git protocol for the management and versioning of research data. At CSIRO we are developing concepts and tools for the agile management of software code and research data for virtual research environments, based on our experiences of actual data analytics projects in the geosciences. We use code management that allows researchers to interact with the code through tools like Jupyter Notebooks while data are held in an object store. Our aim is an architecture allowing seamless integration of code development, data management, and data processing in virtual research environments.
Integration of EGA secure data access into Galaxy.
Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; J A Fijneman, Remond; Boiten, Jan-Willem; A Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne
2016-01-01
High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
Integration of EGA secure data access into Galaxy
Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; Fijneman, Remond J.A.; Boiten, Jan-Willem; A. Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne
2016-01-01
High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer. PMID:28232859
Brady, Anne-Marie; Byrne, Gobnait; Quirke, Mary Brigid; Lynch, Aine; Ennis, Shauna; Bhangu, Jaspreet; Prendergast, Meabh
2017-11-01
This study aimed to evaluate the nature and type of communication and workflow arrangements between nurses and doctors out-of-hours (OOH). Effective communication and workflow arrangements between nurses and doctors are essential to minimize risk in hospital settings, particularly in the out-of-hour's period. Timely patient flow is a priority for all healthcare organizations and the quality of communication and workflow arrangements influences patient safety. Qualitative descriptive design and data collection methods included focus groups and individual interviews. A 500 bed tertiary referral acute hospital in Ireland. Junior and senior Non-Consultant Hospital Doctors, staff nurses and nurse managers. Both nurses and doctors acknowledged the importance of good interdisciplinary communication and collaborative working, in sustaining effective workflow and enabling a supportive working environment and patient safety. Indeed, issues of safety and missed care OOH were found to be primarily due to difficulties of communication and workflow. Medical workflow OOH is often dependent on cues and communication to/from nursing. However, communication systems and, in particular the bleep system, considered central to the process of communication between doctors and nurses OOH, can contribute to workflow challenges and increased staff stress. It was reported as commonplace for routine work, that should be completed during normal hours, to fall into OOH when resources were most limited, further compounding risk to patient safety. Enhancement of communication strategies between nurses and doctors has the potential to remove barriers to effective decision-making and patient flow. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Widening the adoption of workflows to include human and human-machine scientific processes
NASA Astrophysics Data System (ADS)
Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.
2010-12-01
Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.
Cai, Bin; Altman, Michael B; Garcia-Ramirez, Jose; LaBrash, Jason; Goddu, S Murty; Mutic, Sasa; Parikh, Parag J; Olsen, Jeffrey R; Saad, Nael; Zoberi, Jacqueline E
To develop a safe and robust workflow for yttrium-90 (Y-90) radioembolization procedures in a multidisciplinary team environment. A generalized Define-Measure-Analyze-Improve-Control (DMAIC)-based approach to process improvement was applied to a Y-90 radioembolization workflow. In the first DMAIC cycle, events with the Y-90 workflow were defined and analyzed. To improve the workflow, a web-based interactive electronic white board (EWB) system was adopted as the central communication platform and information processing hub. The EWB-based Y-90 workflow then underwent a second DMAIC cycle. Out of 245 treatments, three misses that went undetected until treatment initiation were recorded over a period of 21 months, and root-cause-analysis was performed to determine causes of each incident and opportunities for improvement. The EWB-based Y-90 process was further improved via new rules to define reliable sources of information as inputs into the planning process, as well as new check points to ensure this information was communicated correctly throughout the process flow. After implementation of the revised EWB-based Y-90 workflow, after two DMAIC-like cycles, there were zero misses out of 153 patient treatments in 1 year. The DMAIC-based approach adopted here allowed the iterative development of a robust workflow to achieve an adaptable, event-minimizing planning process despite a complex setting which requires the participation of multiple teams for Y-90 microspheres therapy. Implementation of such a workflow using the EWB or similar platform with a DMAIC-based process improvement approach could be expanded to other treatment procedures, especially those requiring multidisciplinary management. Copyright © 2016 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.
BioMAJ: a flexible framework for databanks synchronization and processing.
Filangi, Olivier; Beausse, Yoann; Assi, Anthony; Legrand, Ludovic; Larré, Jean-Marc; Martin, Véronique; Collin, Olivier; Caron, Christophe; Leroy, Hugues; Allouche, David
2008-08-15
Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks. http://biomaj.genouest.org. BioMAJ is free open software. It is freely available under the CECILL version 2 license.
Enriching the Web Processing Service
NASA Astrophysics Data System (ADS)
Wosniok, Christoph; Bensmann, Felix; Wössner, Roman; Kohlus, Jörn; Roosmann, Rainer; Heidmann, Carsten; Lehfeldt, Rainer
2014-05-01
The OGC Web Processing Service (WPS) provides a standard for implementing geospatial processes in service-oriented networks. In its current version 1.0.0 it allocates the operations GetCapabilities, DescribeProcess and Execute, which can be used to offer custom processes based on single or multiple sub-processes. A large range of ready to use fine granular, fundamental geospatial processes have been developed by the GIS-community in the past. However, modern use cases or whole workflow processes demand specifications of lifecycle management and service orchestration. Orchestrating smaller sub-processes is a task towards interoperability; a comprehensive documentation by using appropriate metadata is also required. Though different approaches were tested in the past, developing complex WPS applications still requires programming skills, knowledge about software libraries in use and a lot of effort for integration. Our toolset RichWPS aims at providing a better overall experience by setting up two major components. The RichWPS ModelBuilder enables the graphics-aided design of workflow processes based on existing local and distributed processes and geospatial services. Once tested by the RichWPS Server, a composition can be deployed for production use on the RichWPS Server. The ModelBuilder obtains necessary processes and services from a directory service, the RichWPS semantic proxy. It manages the lifecycle and is able to visualize results and debugging-information. One aim will be to generate reproducible results; the workflow should be documented by metadata that can be integrated in Spatial Data Infrastructures. The RichWPS Server provides a set of interfaces to the ModelBuilder for, among others, testing composed workflow sequences, estimating their performance and to publish them as common processes. Therefore the server is oriented towards the upcoming WPS 2.0 standard and its ability to transactionally deploy and undeploy processes making use of a WPS-T interface. In order to deal with the results of these processing workflows, a server side extension enables the RichWPS Server and its clients to use WPS presentation directives (WPS-PD), a content related enhancement for the standardized WPS schema. We identified essential requirements of the components of our toolset by applying two use cases. The first enables the simplified comparison of modeled and measured data, a common task in hydro-engineering to validate the accuracy of a model. An implementation of the workflow includes reading, harmonizing and comparing two datasets in NetCDF-format. 2D Water level data from the German Bight can be chosen, presented and evaluated in a web client with interactive plots. The second use case is motivated by the Marine Strategy Directive (MSD) of the EU, which demands monitoring, action plans and at least an evaluation of the ecological situation in marine environment. Information technics adapted to those of INSPIRE should be used. One of the parameters monitored and evaluated for MSD is the expansion and quality of seagrass fields. With the view towards other evaluation parameters we decompose the complex process of evaluation of seagrass in reusable process steps and implement those packages as configurable WPS.
The Kiel data management infrastructure - arising from a generic data model
NASA Astrophysics Data System (ADS)
Fleischer, D.; Mehrtens, H.; Schirnick, C.; Springer, P.
2010-12-01
The Kiel Data Management Infrastructure (KDMI) started from a cooperation of three large-scale projects (SFB574, SFB754 and Cluster of Excellence The Future Ocean) and the Leibniz Institute of Marine Sciences (IFM-GEOMAR). The common strategy for project data management is a single person collecting and transforming data according to the requirements of the targeted data center(s). The intention of the KDMI cooperation is to avoid redundant and potentially incompatible data management efforts for scientists and data managers and to create a single sustainable infrastructure. An increased level of complexity in the conceptual planing arose from the diversity of marine disciplines and approximately 1000 scientists involved. KDMI key features focus on the data provenance which we consider to comprise the entire workflow from field sampling thru labwork to data calculation and evaluation. Managing the data of each individual project participant in this way yields the data management for the entire project and warrants the reusability of (meta)data. Accordingly scientists provide a workflow definition of their data creation procedures resulting in their target variables. The central idea in the development of the KDMI presented here is based on the object oriented programming concept which allows to have one object definition (workflow) and infinite numbers of object instances (data). Each definition is created by a graphical user interface and produces XML output stored in a database using a generic data model. On creation of a data instance the KDMI translates the definition into web forms for the scientist, the generic data model then accepts all information input following the given data provenance definition. An important aspect of the implementation phase is the possibility of a successive transition from daily measurement routines resulting in single spreadsheet files with well known points of failure and limited reuseability to a central infrastructure as a single point of truth. The data provenance approach has the following positive side effects: (1) the scientist designs the extend and timing of data and metadata prompts by workflow definitions himself while (2) consistency and completeness (mandatory information) of metadata in the resulting XML document can be checked by XML validation. (3) Storage of the entire data creation process (including raw data and processing steps) provides a multidimensional quality history accessible by all researchers in addition to the commonly applied one dimensional quality flag system. (4) The KDMI can be extended to other scientific disciplines by adding new workflows and domain specific outputs assisted by the KDMI-Team. The KDMI is a social network inspired system but instead of sharing privacy it is a sharing platform for daily scientific work, data and their provenance.
Case Report: Activity Diagrams for Integrating Electronic Prescribing Tools into Clinical Workflow
Johnson, Kevin B.; FitzHenry, Fern
2006-01-01
To facilitate the future implementation of an electronic prescribing system, this case study modeled prescription management processes in various primary care settings. The Vanderbilt e-prescribing design team conducted initial interviews with clinic managers, physicians and nurses, and then represented the sequences of steps carried out to complete prescriptions in activity diagrams. The diagrams covered outpatient prescribing for patients during a clinic visit and between clinic visits. Practice size, practice setting, and practice specialty type influenced the prescribing processes used. The model developed may be useful to others engaged in building or tailoring an e-prescribing system to meet the specific workflows of various clinic settings. PMID:16622168
ATLAS Distributed Computing Experience and Performance During the LHC Run-2
NASA Astrophysics Data System (ADS)
Filipčič, A.;
2017-10-01
ATLAS Distributed Computing during LHC Run-1 was challenged by steadily increasing computing, storage and network requirements. In addition, the complexity of processing task workflows and their associated data management requirements led to a new paradigm in the ATLAS computing model for Run-2, accompanied by extensive evolution and redesign of the workflow and data management systems. The new systems were put into production at the end of 2014, and gained robustness and maturity during 2015 data taking. ProdSys2, the new request and task interface; JEDI, the dynamic job execution engine developed as an extension to PanDA; and Rucio, the new data management system, form the core of Run-2 ATLAS distributed computing engine. One of the big changes for Run-2 was the adoption of the Derivation Framework, which moves the chaotic CPU and data intensive part of the user analysis into the centrally organized train production, delivering derived AOD datasets to user groups for final analysis. The effectiveness of the new model was demonstrated through the delivery of analysis datasets to users just one week after data taking, by completing the calibration loop, Tier-0 processing and train production steps promptly. The great flexibility of the new system also makes it possible to execute part of the Tier-0 processing on the grid when Tier-0 resources experience a backlog during high data-taking periods. The introduction of the data lifetime model, where each dataset is assigned a finite lifetime (with extensions possible for frequently accessed data), was made possible by Rucio. Thanks to this the storage crises experienced in Run-1 have not reappeared during Run-2. In addition, the distinction between Tier-1 and Tier-2 disk storage, now largely artificial given the quality of Tier-2 resources and their networking, has been removed through the introduction of dynamic ATLAS clouds that group the storage endpoint nucleus and its close-by execution satellite sites. All stable ATLAS sites are now able to store unique or primary copies of the datasets. ATLAS Distributed Computing is further evolving to speed up request processing by introducing network awareness, using machine learning and optimisation of the latencies during the execution of the full chain of tasks. The Event Service, a new workflow and job execution engine, is designed around check-pointing at the level of event processing to use opportunistic resources more efficiently. ATLAS has been extensively exploring possibilities of using computing resources extending beyond conventional grid sites in the WLCG fabric to deliver as many computing cycles as possible and thereby enhance the significance of the Monte-Carlo samples to deliver better physics results. The exploitation of opportunistic resources was at an early stage throughout 2015, at the level of 10% of the total ATLAS computing power, but in the next few years it is expected to deliver much more. In addition, demonstrating the ability to use an opportunistic resource can lead to securing ATLAS allocations on the facility, hence the importance of this work goes beyond merely the initial CPU cycles gained. In this paper, we give an overview and compare the performance, development effort, flexibility and robustness of the various approaches.
A software tool to analyze clinical workflows from direct observations.
Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander
2015-01-01
Observational data of clinical processes need to be managed in a convenient way, so that process information is reliable, valid and viable for further analysis. However, existing tools for allocating observations fail in systematic data collection of specific workflow recordings. We present a software tool which was developed to facilitate the analysis of clinical process observations. The tool was successfully used in the project OntoHealth, to build, store and analyze observations of diabetes routine consultations.
Boes, Peter; Ho, Meng Wei; Li, Zuofeng
2015-01-01
Image‐guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X‐rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post‐treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state‐of‐the‐art Web technologies, a domain model closely matching the workflow, a database‐supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model‐View‐Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client‐side technologies, such as jQuery, jQuery Plug‐ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process. PACS number: 87 PMID:26103504
REEF: Retainable Evaluator Execution Framework
Weimer, Markus; Chen, Yingda; Chun, Byung-Gon; Condie, Tyson; Curino, Carlo; Douglas, Chris; Lee, Yunseong; Majestro, Tony; Malkhi, Dahlia; Matusevych, Sergiy; Myers, Brandon; Narayanamurthy, Shravan; Ramakrishnan, Raghu; Rao, Sriram; Sears, Russell; Sezgin, Beysim; Wang, Julia
2015-01-01
Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning algorithm, and a port of the CORFU [4] system. REEF is also currently an Apache Incubator project that has attracted contributors from several instititutions.1 PMID:26819493
Using conceptual work products of health care to design health IT.
Berry, Andrew B L; Butler, Keith A; Harrington, Craig; Braxton, Melissa O; Walker, Amy J; Pete, Nikki; Johnson, Trevor; Oberle, Mark W; Haselkorn, Jodie; Paul Nichol, W; Haselkorn, Mark
2016-02-01
This paper introduces a new, model-based design method for interactive health information technology (IT) systems. This method extends workflow models with models of conceptual work products. When the health care work being modeled is substantially cognitive, tacit, and complex in nature, graphical workflow models can become too complex to be useful to designers. Conceptual models complement and simplify workflows by providing an explicit specification for the information product they must produce. We illustrate how conceptual work products can be modeled using standard software modeling language, which allows them to provide fundamental requirements for what the workflow must accomplish and the information that a new system should provide. Developers can use these specifications to envision how health IT could enable an effective cognitive strategy as a workflow with precise information requirements. We illustrate the new method with a study conducted in an outpatient multiple sclerosis (MS) clinic. This study shows specifically how the different phases of the method can be carried out, how the method allows for iteration across phases, and how the method generated a health IT design for case management of MS that is efficient and easy to use. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chao, Tian-Jy; Kim, Younghun
An end-to-end interoperability and workflows from building architecture design to one or more simulations, in one aspect, may comprise establishing a BIM enablement platform architecture. A data model defines data entities and entity relationships for enabling the interoperability and workflows. A data definition language may be implemented that defines and creates a table schema of a database associated with the data model. Data management services and/or application programming interfaces may be implemented for interacting with the data model. Web services may also be provided for interacting with the data model via the Web. A user interface may be implemented thatmore » communicates with users and uses the BIM enablement platform architecture, the data model, the data definition language, data management services and application programming interfaces to provide functions to the users to perform work related to building information management.« less
NASA Astrophysics Data System (ADS)
Maesano, Francesco E.; D'Ambrogi, Chiara
2017-02-01
We present Vel-IO 3D, a tool for 3D velocity model creation and time-depth conversion, as part of a workflow for 3D model building. The workflow addresses the management of large subsurface dataset, mainly seismic lines and well logs, and the construction of a 3D velocity model able to describe the variation of the velocity parameters related to strong facies and thickness variability and to high structural complexity. Although it is applicable in many geological contexts (e.g. foreland basins, large intermountain basins), it is particularly suitable in wide flat regions, where subsurface structures have no surface expression. The Vel-IO 3D tool is composed by three scripts, written in Python 2.7.11, that automate i) the 3D instantaneous velocity model building, ii) the velocity model optimization, iii) the time-depth conversion. They determine a 3D geological model that is consistent with the primary geological constraints (e.g. depth of the markers on wells). The proposed workflow and the Vel-IO 3D tool have been tested, during the EU funded Project GeoMol, by the construction of the 3D geological model of a flat region, 5700 km2 in area, located in the central part of the Po Plain. The final 3D model showed the efficiency of the workflow and Vel-IO 3D tool in the management of large amount of data both in time and depth domain. A 4 layer-cake velocity model has been applied to a several thousand (5000-13,000 m) thick succession, with 15 horizons from Triassic up to Pleistocene, complicated by a Mesozoic extensional tectonics and by buried thrusts related to Southern Alps and Northern Apennines.
Building a Trustworthy Environmental Science Data Repository: Lessons Learned from the ORNL DAAC
NASA Astrophysics Data System (ADS)
Wei, Y.; Santhana Vannan, S. K.; Boyer, A.; Beaty, T.; Deb, D.; Hook, L.
2017-12-01
The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC, https://daac.ornl.gov) for biogeochemical dynamics is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers. The mission of the ORNL DAAC is to assemble, distribute, and provide data services for a comprehensive archive of terrestrial biogeochemistry and ecological dynamics observations and models to facilitate research, education, and decision-making in support of NASA's Earth Science. Since its establishment in 1994, ORNL DAAC has been continuously building itself into a trustworthy environmental science data repository by not only ensuring the quality and usability of its data holdings, but also optimizing its data publication and management process. This paper describes the lessons learned from ORNL DAAC's effort toward this goal. ORNL DAAC has been proactively implementing international community standards throughout its data management life cycle, including data publication, preservation, discovery, visualization, and distribution. Data files in standard formats, detailed documentation, and metadata following standard models are prepared to improve the usability and longevity of data products. Assignment of a Digital Object Identifier (DOI) ensures the identifiability and accessibility of every data product, including the different versions and revisions of its life cycle. ORNL DAAC's data citation policy assures data producers receive appropriate recognition of use of their products. Web service standards, such as OpenSearch and Open Geospatial Consortium (OGC), promotes the discovery, visualization, distribution, and integration of ORNL DAAC's data holdings. Recently, ORNL DAAC began efforts to optimize and standardize its data archival and data publication workflows, to improve the efficiency and transparency of its data archival and management processes.
Task-technology fit of video telehealth for nurses in an outpatient clinic setting.
Cady, Rhonda G; Finkelstein, Stanley M
2014-07-01
Incorporating telehealth into outpatient care delivery supports management of consumer health between clinic visits. Task-technology fit is a framework for understanding how technology helps and/or hinders a person during work processes. Evaluating the task-technology fit of video telehealth for personnel working in a pediatric outpatient clinic and providing care between clinic visits ensures the information provided matches the information needed to support work processes. The workflow of advanced practice registered nurse (APRN) care coordination provided via telephone and video telehealth was described and measured using a mixed-methods workflow analysis protocol that incorporated cognitive ethnography and time-motion study. Qualitative and quantitative results were merged and analyzed within the task-technology fit framework to determine the workflow fit of video telehealth for APRN care coordination. Incorporating video telehealth into APRN care coordination workflow provided visual information unavailable during telephone interactions. Despite additional tasks and interactions needed to obtain the visual information, APRN workflow efficiency, as measured by time, was not significantly changed. Analyzed within the task-technology fit framework, the increased visual information afforded by video telehealth supported the assessment and diagnostic information needs of the APRN. Telehealth must provide the right information to the right clinician at the right time. Evaluating task-technology fit using a mixed-methods protocol ensured rigorous analysis of fit within work processes and identified workflows that benefit most from the technology.
Adventures in Private Cloud: Balancing Cost and Capability at the CloudSat Data Processing Center
NASA Astrophysics Data System (ADS)
Partain, P.; Finley, S.; Fluke, J.; Haynes, J. M.; Cronk, H. Q.; Miller, S. D.
2016-12-01
Since the beginning of the CloudSat Mission in 2006, The CloudSat Data Processing Center (DPC) at the Cooperative Institute for Research in the Atmosphere (CIRA) has been ingesting data from the satellite and other A-Train sensors, producing data products, and distributing them to researchers around the world. The computing infrastructure was specifically designed to fulfill the requirements as specified at the beginning of what nominally was a two-year mission. The environment consisted of servers dedicated to specific processing tasks in a rigid workflow to generate the required products. To the benefit of science and with credit to the mission engineers, CloudSat has lasted well beyond its planned lifetime and is still collecting data ten years later. Over that period requirements of the data processing system have greatly expanded and opportunities for providing value-added services have presented themselves. But while demands on the system have increased, the initial design allowed for very little expansion in terms of scalability and flexibility. The design did change to include virtual machine processing nodes and distributed workflows but infrastructure management was still a time consuming task when system modification was required to run new tests or implement new processes. To address the scalability, flexibility, and manageability of the system Cloud computing methods and technologies are now being employed. The use of a public cloud like Amazon Elastic Compute Cloud or Google Compute Engine was considered but, among other issues, data transfer and storage cost becomes a problem especially when demand fluctuates as a result of reprocessing and the introduction of new products and services. Instead, the existing system was converted to an on premises private Cloud using the OpenStack computing platform and Ceph software defined storage to reap the benefits of the Cloud computing paradigm. This work details the decisions that were made, the benefits that have been realized, the difficulties that were encountered and issues that still exist.
DOE Office of Scientific and Technical Information (OSTI.GOV)
JENNINGS, T.L.
The Work Flow analysis Report will be used to facilitate the requirements for implementing the Work Control module of Passport. The report consists of workflow integration processes for Work Management, Preventative Maintenance, Materials and Equipment
Hbim Methodology as a Bridge Between Italy and Argentina
NASA Astrophysics Data System (ADS)
Moreira, A.; Quattrini, R.; Maggiolo, G.; Mammoli, R.
2018-05-01
The availability of efficient HBIM workflows could represent a very important change towards a more efficient management of the historical real estate. The present work shows how to obtain accurate and reliable information of heritage buildings through reality capture and 3D modelling to support restoration purposes or knowledge-based applications. Two cases studies metaphorically joint Italy with Argentina. The research article explains the workflows applied at the Palazzo Ferretti at Ancona and the Manzana Histórica de la Universidad National del Litoral, providing a constructive comparison and blending technological and theoretical approaches. In a bottom-up process, the assessment of two cases study validates a workflow allowing the achievement of a useful and proper data enrichment of each HBIM model. Another key aspect is the Level of Development (LOD) evaluation of both models: different ranges and scales are defined in America (100-500) and in Italy (A-G), nevertheless is possible to obtain standard shared procedures, enabling facilitation of HBIM development and diffusion in operating workflows.
Camporese, Alessandro
2004-06-01
The diagnosis of infectious diseases and the role of the microbiology laboratory are currently undergoing a process of change. The need for overall efficiency in providing results is now given the same importance as accuracy. This means that laboratories must be able to produce quality results in less time with the capacity to interpret the results clinically. To improve the clinical impact of microbiology results, the new challenge facing the microbiologist has become one of process management instead of pure analysis. A proper project management process designed to improve workflow, reduce analytical time, and provide the same high quality results without losing valuable time treating the patient, has become essential. Our objective was to study the impact of introducing automation and computerization into the microbiology laboratory, and the reorganization of the laboratory workflow, i.e. scheduling personnel to work shifts covering both the entire day and the entire week. In our laboratory, the introduction of automation and computerization, as well as the reorganization of personnel, thus the workflow itself, has resulted in an improvement in response time and greater efficiency in diagnostic procedures.
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.
Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert
2017-05-15
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.
Implementation of a 'lean' cytopathology service: towards routine same-day reporting.
Hewer, Ekkehard; Hammer, Caroline; Fricke-Vetsch, Daniela; Baumann, Cinzia; Perren, Aurel; Schmitt, Anja M
2018-05-01
To systematically assess the effects of a Lean management intervention in an academic cytopathology service. We monitored outcomes including specimen turnaround times during stepwise implementation of a lean cytopathology workflow for gynaecological and non-gynaecological cytology. The intervention resulted in a major reduction of turnaround times for both gynaecological (3rd quartile 4.1 vs 2.3 working days) and non-gynaecological cytology (3rd quartile 1.9 vs. 1.2 working days). Introduction of fully electronic reporting had additional effect over continuous staining of slides alone. The rate of non-gynaecological specimens reported the same day increased from 4.5% to 56.5% of specimens received before noon. Lean management principles provide a useful framework for organization of a cytopathology workflow. Stepwise implementation beginning with a simplified gynaecological cytology workflow allowed involved staff to monitor the effects of individual changes and allowed for a smooth transition. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The impact of missing sensor information on surgical workflow management.
Liebmann, Philipp; Meixensberger, Jürgen; Wiedemann, Peter; Neumuth, Thomas
2013-09-01
Sensor systems in the operating room may encounter intermittent data losses that reduce the performance of surgical workflow management systems (SWFMS). Sensor data loss could impact SWFMS-based decision support, device parameterization, and information presentation. The purpose of this study was to understand the robustness of surgical process models when sensor information is partially missing. SWFMS changes caused by wrong or no data from the sensor system which tracks the progress of a surgical intervention were tested. The individual surgical process models (iSPMs) from 100 different cataract procedures of 3 ophthalmologic surgeons were used to select a randomized subset and create a generalized surgical process model (gSPM). A disjoint subset was selected from the iSPMs and used to simulate the surgical process against the gSPM. The loss of sensor data was simulated by removing some information from one task in the iSPM. The effect of missing sensor data was measured using several metrics: (a) successful relocation of the path in the gSPM, (b) the number of steps to find the converging point, and (c) the perspective with the highest occurrence of unsuccessful path findings. A gSPM built using 30% of the iSPMs successfully found the correct path in 90% of the cases. The most critical sensor data were the information regarding the instrument used by the surgeon. We found that use of a gSPM to provide input data for a SWFMS is robust and can be accurate despite missing sensor data. A surgical workflow management system can provide the surgeon with workflow guidance in the OR for most cases. Sensor systems for surgical process tracking can be evaluated based on the stability and accuracy of functional and spatial operative results.
NASA Astrophysics Data System (ADS)
Yang, Xin; He, Zhen-yu; Jiang, Xiao-bo; Lin, Mao-sheng; Zhong, Ning-shan; Hu, Jiang; Qi, Zhen-yu; Bao, Yong; Li, Qiao-qiao; Li, Bao-yue; Hu, Lian-ying; Lin, Cheng-guang; Gao, Yuan-hong; Liu, Hui; Huang, Xiao-yan; Deng, Xiao-wu; Xia, Yun-fei; Liu, Meng-zhong; Sun, Ying
2017-03-01
To meet the special demands in China and the particular needs for the radiotherapy department, a MOSAIQ Integration Platform CHN (MIP) based on the workflow of radiation therapy (RT) has been developed, as a supplement system to the Elekta MOSAIQ. The MIP adopts C/S (client-server) structure mode, and its database is based on the Treatment Planning System (TPS) and MOSAIQ SQL Server 2008, running on the hospital local network. Five network servers, as a core hardware, supply data storage and network service based on the cloud services. The core software, using C# programming language, is developed based on Microsoft Visual Studio Platform. The MIP server could offer network service, including entry, query, statistics and print information for about 200 workstations at the same time. The MIP was implemented in the past one and a half years, and some practical patient-oriented functions were developed. And now the MIP is almost covering the whole workflow of radiation therapy. There are 15 function modules, such as: Notice, Appointment, Billing, Document Management (application/execution), System Management, and so on. By June of 2016, recorded data in the MIP are as following: 13546 patients, 13533 plan application, 15475 RT records, 14656 RT summaries, 567048 billing records and 506612 workload records, etc. The MIP based on the RT workflow has been successfully developed and clinically implemented with real-time performance, data security, stable operation. And it is demonstrated to be user-friendly and is proven to significantly improve the efficiency of the department. It is a key to facilitate the information sharing and department management. More functions can be added or modified for further enhancement its potentials in research and clinical practice.
Tanner, C; Gans, D; White, J; Nath, R; Pohl, J
2015-01-01
The role of electronic health records (EHR) in enhancing patient safety, while substantiated in many studies, is still debated. This paper examines early EHR adopters in primary care to understand the extent to which EHR implementation is associated with the workflows, policies and practices that promote patient safety, as compared to practices with paper records. Early adoption is defined as those who were using EHR prior to implementation of the Meaningful Use program. We utilized the Physician Practice Patient Safety Assessment (PPPSA) to compare primary care practices with fully implemented EHR to those utilizing paper records. The PPPSA measures the extent of adoption of patient safety practices in the domains: medication management, handoffs and transition, personnel qualifications and competencies, practice management and culture, and patient communication. Data from 209 primary care practices responding between 2006-2010 were included in the analysis: 117 practices used paper medical records and 92 used an EHR. Results showed that, within all domains, EHR settings showed significantly higher rates of having workflows, policies and practices that promote patient safety than paper record settings. While these results were expected in the area of medication management, EHR use was also associated with adoption of patient safety practices in areas in which the researchers had no a priori expectations of association. Sociotechnical models of EHR use point to complex interactions between technology and other aspects of the environment related to human resources, workflow, policy, culture, among others. This study identifies that among primary care practices in the national PPPSA database, having an EHR was strongly empirically associated with the workflow, policy, communication and cultural practices recommended for safe patient care in ambulatory settings.
Incorporating Brokers within Collaboration Environments
NASA Astrophysics Data System (ADS)
Rajasekar, A.; Moore, R.; de Torcy, A.
2013-12-01
A collaboration environment, such as the integrated Rule Oriented Data System (iRODS - http://irods.diceresearch.org), provides interoperability mechanisms for accessing storage systems, authentication systems, messaging systems, information catalogs, networks, and policy engines from a wide variety of clients. The interoperability mechanisms function as brokers, translating actions requested by clients to the protocol required by a specific technology. The iRODS data grid is used to enable collaborative research within hydrology, seismology, earth science, climate, oceanography, plant biology, astronomy, physics, and genomics disciplines. Although each domain has unique resources, data formats, semantics, and protocols, the iRODS system provides a generic framework that is capable of managing collaborative research initiatives that span multiple disciplines. Each interoperability mechanism (broker) is linked to a name space that enables unified access across the heterogeneous systems. The collaboration environment provides not only support for brokers, but also support for virtualization of name spaces for users, files, collections, storage systems, metadata, and policies. The broker enables access to data or information in a remote system using the appropriate protocol, while the collaboration environment provides a uniform naming convention for accessing and manipulating each object. Within the NSF DataNet Federation Consortium project (http://www.datafed.org), three basic types of interoperability mechanisms have been identified and applied: 1) drivers for managing manipulation at the remote resource (such as data subsetting), 2) micro-services that execute the protocol required by the remote resource, and 3) policies for controlling the execution. For example, drivers have been written for manipulating NetCDF and HDF formatted files within THREDDS servers. Micro-services have been written that manage interactions with the CUAHSI data repository, the DataONE information catalog, and the GeoBrain broker. Policies have been written that manage transfer of messages between an iRODS message queue and the Advanced Message Queuing Protocol. Examples of these brokering mechanisms will be presented. The DFC collaboration environment serves as the intermediary between community resources and compute grids, enabling reproducible data-driven research. It is possible to create an analysis workflow that retrieves data subsets from a remote server, assemble the required input files, automate the execution of the workflow, automatically track the provenance of the workflow, and share the input files, workflow, and output files. A collaborator can re-execute a shared workflow, compare results, change input files, and re-execute an analysis.
Leadership characteristics and business management in modern academic surgery.
Büchler, Peter; Martin, David; Knaebel, Hanns-Peter; Büchler, Markus W
2006-04-01
Management skills are necessary to successfully lead a surgical department in future. This article focuses on practical aspects of surgical management, leadership and training. It demonstrates how the implementation of business management concepts changes workflow management and surgical training. A systematic Medline search was performed and business management publications were analysed. Neither management nor leadership skills are inborn but acquired. Management is about planning, controlling and putting appropriate structures in place. Leadership is anticipating and coping with change and people, and adopting a visionary stance. More change requires more leadership. Changes in surgery occur with unprecedented speed because of a growing demand for surgical procedures with limited financial resources. Modern leadership and management theories have to be tailored to surgery. It is clear that not all of them are applicable but some of them are essential for surgeons. In business management, common traits of successful leaders include team orientation and communication skills. As the most important character, however, appears to be the emotional intelligence. Novel training concepts for surgeons include on-the-job training and introduction of improved workflow management systems, e.g. the central case management. The need for surgeons with advanced skills in business, finance and organisational management is evident and will require systematic and tailored training.
NASA Astrophysics Data System (ADS)
Laban, Shaban; El-Desouky, Aly
2014-05-01
To achieve a rapid, simple and reliable parallel processing of different types of tasks and big data processing on any compute cluster, a lightweight messaging-based distributed applications processing and workflow execution framework model is proposed. The framework is based on Apache ActiveMQ and Simple (or Streaming) Text Oriented Message Protocol (STOMP). ActiveMQ , a popular and powerful open source persistence messaging and integration patterns server with scheduler capabilities, acts as a message broker in the framework. STOMP provides an interoperable wire format that allows framework programs to talk and interact between each other and ActiveMQ easily. In order to efficiently use the message broker a unified message and topic naming pattern is utilized to achieve the required operation. Only three Python programs and simple library, used to unify and simplify the implementation of activeMQ and STOMP protocol, are needed to use the framework. A watchdog program is used to monitor, remove, add, start and stop any machine and/or its different tasks when necessary. For every machine a dedicated one and only one zoo keeper program is used to start different functions or tasks, stompShell program, needed for executing the user required workflow. The stompShell instances are used to execute any workflow jobs based on received message. A well-defined, simple and flexible message structure, based on JavaScript Object Notation (JSON), is used to build any complex workflow systems. Also, JSON format is used in configuration, communication between machines and programs. The framework is platform independent. Although, the framework is built using Python the actual workflow programs or jobs can be implemented by any programming language. The generic framework can be used in small national data centres for processing seismological and radionuclide data received from the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). Also, it is possible to extend the use of the framework in monitoring the IDC pipeline. The detailed design, implementation,conclusion and future work of the proposed framework will be presented.
DietPal: A Web-Based Dietary Menu-Generating and Management System
Abdullah, Siti Norulhuda; Shahar, Suzana; Abdul-Hamid, Helmi; Khairudin, Nurkahirizan; Yusoff, Mohamed; Ghazali, Rafidah; Mohd-Yusoff, Nooraini; Shafii, Nik Shanita; Abdul-Manaf, Zaharah
2004-01-01
Background Attempts in current health care practice to make health care more accessible, effective, and efficient through the use of information technology could include implementation of computer-based dietary menu generation. While several of such systems already exist, their focus is mainly to assist healthy individuals calculate their calorie intake and to help monitor the selection of menus based upon a prespecified calorie value. Although these prove to be helpful in some ways, they are not suitable for monitoring, planning, and managing patients' dietary needs and requirements. This paper presents a Web-based application that simulates the process of menu suggestions according to a standard practice employed by dietitians. Objective To model the workflow of dietitians and to develop, based on this workflow, a Web-based system for dietary menu generation and management. The system is aimed to be used by dietitians or by medical professionals of health centers in rural areas where there are no designated qualified dietitians. Methods First, a user-needs study was conducted among dietitians in Malaysia. The first survey of 93 dietitians (with 52 responding) was an assessment of information needed for dietary management and evaluation of compliance towards a dietary regime. The second study consisted of ethnographic observation and semi-structured interviews with 14 dietitians in order to identify the workflow of a menu-suggestion process. We subsequently designed and developed a Web-based dietary menu generation and management system called DietPal. DietPal has the capability of automatically calculating the nutrient and calorie intake of each patient based on the dietary recall as well as generating suitable diet and menu plans according to the calorie and nutrient requirement of the patient, calculated from anthropometric measurements. The system also allows reusing stored or predefined menus for other patients with similar health and nutrient requirements. Results We modeled the workflow of menu-suggestion activity currently adhered to by dietitians in Malaysia. Based on this workflow, a Web-based system was developed. Initial post evaluation among 10 dietitians indicates that they are comfortable with the organization of the modules and information. Conclusions The system has the potential of enhancing the quality of services with the provision of standard and healthy menu plans and at the same time increasing outreach, particularly to rural areas. With its potential capability of optimizing the time spent by dietitians to plan suitable menus, more quality time could be spent delivering nutrition education to the patients. PMID:15111270
DietPal: a Web-based dietary menu-generating and management system.
Noah, Shahrul A; Abdullah, Siti Norulhuda; Shahar, Suzana; Abdul-Hamid, Helmi; Khairudin, Nurkahirizan; Yusoff, Mohamed; Ghazali, Rafidah; Mohd-Yusoff, Nooraini; Shafii, Nik Shanita; Abdul-Manaf, Zaharah
2004-01-30
Attempts in current health care practice to make health care more accessible, effective, and efficient through the use of information technology could include implementation of computer-based dietary menu generation. While several of such systems already exist, their focus is mainly to assist healthy individuals calculate their calorie intake and to help monitor the selection of menus based upon a prespecified calorie value. Although these prove to be helpful in some ways, they are not suitable for monitoring, planning, and managing patients' dietary needs and requirements. This paper presents a Web-based application that simulates the process of menu suggestions according to a standard practice employed by dietitians. To model the workflow of dietitians and to develop, based on this workflow, a Web-based system for dietary menu generation and management. The system is aimed to be used by dietitians or by medical professionals of health centers in rural areas where there are no designated qualified dietitians. First, a user-needs study was conducted among dietitians in Malaysia. The first survey of 93 dietitians (with 52 responding) was an assessment of information needed for dietary management and evaluation of compliance towards a dietary regime. The second study consisted of ethnographic observation and semi-structured interviews with 14 dietitians in order to identify the workflow of a menu-suggestion process. We subsequently designed and developed a Web-based dietary menu generation and management system called DietPal. DietPal has the capability of automatically calculating the nutrient and calorie intake of each patient based on the dietary recall as well as generating suitable diet and menu plans according to the calorie and nutrient requirement of the patient, calculated from anthropometric measurements. The system also allows reusing stored or predefined menus for other patients with similar health and nutrient requirements. We modeled the workflow of menu-suggestion activity currently adhered to by dietitians in Malaysia. Based on this workflow, a Web-based system was developed. Initial post evaluation among 10 dietitians indicates that they are comfortable with the organization of the modules and information. The system has the potential of enhancing the quality of services with the provision of standard and healthy menu plans and at the same time increasing outreach, particularly to rural areas. With its potential capability of optimizing the time spent by dietitians to plan suitable menus, more quality time could be spent delivering nutrition education to the patients.
Applications of process improvement techniques to improve workflow in abdominal imaging.
Tamm, Eric Peter
2016-03-01
Major changes in the management and funding of healthcare are underway that will markedly change the way radiology studies will be reimbursed. The result will be the need to deliver radiology services in a highly efficient manner while maintaining quality. The science of process improvement provides a practical approach to improve the processes utilized in radiology. This article will address in a step-by-step manner how to implement process improvement techniques to improve workflow in abdominal imaging.
Vahabzadeh, Massoud; Lin, Jia-Ling; Mezghanni, Mustapha; Epstein, David H; Preston, Kenzie L
2009-01-01
A challenge in treatment research is the necessity of adhering to protocol and regulatory strictures while maintaining flexibility to meet patients' treatment needs and to accommodate variations among protocols. Another challenge is the acquisition of large amounts of data in an occasionally hectic environment, along with the provision of seamless methods for exporting, mining and querying the data. We have automated several major functions of our outpatient treatment research clinic for studies in drug abuse and dependence. Here we describe three such specialised applications: the Automated Contingency Management (ACM) system for the delivery of behavioural interventions, the transactional electronic diary (TED) system for the management of behavioural assessments and the Protocol Workflow System (PWS) for computerised workflow automation and guidance of each participant's daily clinic activities. These modules are integrated into our larger information system to enable data sharing in real time among authorised staff. ACM and the TED have each permitted us to conduct research that was not previously possible. In addition, the time to data analysis at the end of each study is substantially shorter. With the implementation of the PWS, we have been able to manage a research clinic with an 80 patient capacity, having an annual average of 18,000 patient visits and 7300 urine collections with a research staff of five. Finally, automated data management has considerably enhanced our ability to monitor and summarise participant safety data for research oversight. When developed in consultation with end users, automation in treatment research clinics can enable more efficient operations, better communication among staff and expansions in research methods.
Mission Assurance in a Distributed Environment
2009-06-01
Notation ( BPMN ) – Graphical representation of business processes in a workflow • Unified Modeling Language (UML) – Use standard UML diagrams to model the system – Component, sequence, activity diagrams
Developing science gateways for drug discovery in a grid environment.
Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra
2016-01-01
Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.
Data distribution method of workflow in the cloud environment
NASA Astrophysics Data System (ADS)
Wang, Yong; Wu, Junjuan; Wang, Ying
2017-08-01
Cloud computing for workflow applications provides the required high efficiency calculation and large storage capacity and it also brings challenges to the protection of trade secrets and other privacy data. Because of privacy data will cause the increase of the data transmission time, this paper presents a new data allocation algorithm based on data collaborative damage degree, to improve the existing data allocation strategy? Safety and public cloud computer algorithm depends on the private cloud; the static allocation method in the initial stage only to the non-confidential data division to improve the original data, in the operational phase will continue to generate data to dynamically adjust the data distribution scheme. The experimental results show that the improved method is effective in reducing the data transmission time.
Systems engineering implementation in the preliminary design phase of the Giant Magellan Telescope
NASA Astrophysics Data System (ADS)
Maiten, J.; Johns, M.; Trancho, G.; Sawyer, D.; Mady, P.
2012-09-01
Like many telescope projects today, the 24.5-meter Giant Magellan Telescope (GMT) is truly a complex system. The primary and secondary mirrors of the GMT are segmented and actuated to support two operating modes: natural seeing and adaptive optics. GMT is a general-purpose telescope supporting multiple science instruments operated in those modes. GMT is a large, diverse collaboration and development includes geographically distributed teams. The need to implement good systems engineering processes for managing the development of systems like GMT becomes imperative. The management of the requirements flow down from the science requirements to the component level requirements is an inherently difficult task in itself. The interfaces must also be negotiated so that the interactions between subsystems and assemblies are well defined and controlled. This paper will provide an overview of the systems engineering processes and tools implemented for the GMT project during the preliminary design phase. This will include requirements management, documentation and configuration control, interface development and technical risk management. Because of the complexity of the GMT system and the distributed team, using web-accessible tools for collaboration is vital. To accomplish this GMTO has selected three tools: Cognition Cockpit, Xerox Docushare, and Solidworks Enterprise Product Data Management (EPDM). Key to this is the use of Cockpit for managing and documenting the product tree, architecture, error budget, requirements, interfaces, and risks. Additionally, drawing management is accomplished using an EPDM vault. Docushare, a documentation and configuration management tool is used to manage workflow of documents and drawings for the GMT project. These tools electronically facilitate collaboration in real time, enabling the GMT team to track, trace and report on key project metrics and design parameters.
Intelligent services for discovery of complex geospatial features from remote sensing imagery
NASA Astrophysics Data System (ADS)
Yue, Peng; Di, Liping; Wei, Yaxing; Han, Weiguo
2013-09-01
Remote sensing imagery has been commonly used by intelligence analysts to discover geospatial features, including complex ones. The overwhelming volume of routine image acquisition requires automated methods or systems for feature discovery instead of manual image interpretation. The methods of extraction of elementary ground features such as buildings and roads from remote sensing imagery have been studied extensively. The discovery of complex geospatial features, however, is still rather understudied. A complex feature, such as a Weapon of Mass Destruction (WMD) proliferation facility, is spatially composed of elementary features (e.g., buildings for hosting fuel concentration machines, cooling towers, transportation roads, and fences). Such spatial semantics, together with thematic semantics of feature types, can be used to discover complex geospatial features. This paper proposes a workflow-based approach for discovery of complex geospatial features that uses geospatial semantics and services. The elementary features extracted from imagery are archived in distributed Web Feature Services (WFSs) and discoverable from a catalogue service. Using spatial semantics among elementary features and thematic semantics among feature types, workflow-based service chains can be constructed to locate semantically-related complex features in imagery. The workflows are reusable and can provide on-demand discovery of complex features in a distributed environment.
Strategic Planning for Electronic Resources Management: A Case Study at Gustavus Adolphus College
ERIC Educational Resources Information Center
Hulseberg, Anna; Monson, Sarah
2009-01-01
Electronic resources, the tools we use to manage them, and the needs and expectations of our users are constantly evolving; at the same time, the roles, responsibilities, and workflow of the library staff who manage e-resources are also in flux. Recognizing a need to be more intentional and proactive about how we manage e-resources, the…
Big data analytics workflow management for eScience
NASA Astrophysics Data System (ADS)
Fiore, Sandro; D'Anca, Alessandro; Palazzo, Cosimo; Elia, Donatello; Mariello, Andrea; Nassisi, Paola; Aloisio, Giovanni
2015-04-01
In many domains such as climate and astrophysics, scientific data is often n-dimensional and requires tools that support specialized data types and primitives if it is to be properly stored, accessed, analysed and visualized. Currently, scientific data analytics relies on domain-specific software and libraries providing a huge set of operators and functionalities. However, most of these software fail at large scale since they: (i) are desktop based, rely on local computing capabilities and need the data locally; (ii) cannot benefit from available multicore/parallel machines since they are based on sequential codes; (iii) do not provide declarative languages to express scientific data analysis tasks, and (iv) do not provide newer or more scalable storage models to better support the data multidimensionality. Additionally, most of them: (v) are domain-specific, which also means they support a limited set of data formats, and (vi) do not provide a workflow support, to enable the construction, execution and monitoring of more complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides several parallel operators to manipulate large datasets. Some relevant examples include: (i) data sub-setting (slicing and dicing), (ii) data aggregation, (iii) array-based primitives (the same operator applies to all the implemented UDF extensions), (iv) data cube duplication, (v) data cube pivoting, (vi) NetCDF-import and export. Metadata operators are available too. Additionally, the Ophidia framework provides array-based primitives to perform data sub-setting, data aggregation (i.e. max, min, avg), array concatenation, algebraic expressions and predicate evaluation on large arrays of scientific data. Bit-oriented plugins have also been implemented to manage binary data cubes. Defining processing chains and workflows with tens, hundreds of data analytics operators is the real challenge in many practical scientific use cases. This talk will specifically address the main needs, requirements and challenges regarding data analytics workflow management applied to large scientific datasets. Three real use cases concerning analytics workflows for sea situational awareness, fire danger prevention, climate change and biodiversity will be discussed in detail.
SMITH: a LIMS for handling next-generation sequencing workflows
2014-01-01
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
SMITH: a LIMS for handling next-generation sequencing workflows.
Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko
2014-01-01
Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
NASA Astrophysics Data System (ADS)
Miles, B.; Band, L. E.
2012-12-01
Water sustainability has been recognized as a fundamental problem of science whose solution relies in part on high-performance computing. Stormwater management is a major concern of urban sustainability. Understanding interactions between urban landcover and stormwater nutrient pollution requires consideration of fine-scale residential stormwater management, which in turn requires high-resolution LIDAR and landcover data not provided through national spatial data infrastructure, as well as field observation at the household scale. The objectives of my research are twofold: (1) advance understanding of the relationship between residential stormwater management practices and the export of nutrient pollution from stormwater in urbanized ecosystems; and (2) improve the informatics workflows used in community ecohydrology modeling as applied to heterogeneous urbanized ecosystems. In support of these objectives, I present preliminary results from initial work to: (1) develop an ecohydrology workflow platform that automates data preparation while maintaining data provenance and model metadata to yield reproducible workflows and support model benchmarking; (2) perform field observation of existing patterns of residential rooftop impervious surface connectivity to stormwater networks; and (3) develop Regional Hydro-Ecological Simulation System (RHESSys) models for watersheds in Baltimore, MD (as part of the Baltimore Ecosystem Study (BES) NSF Long-Term Ecological Research (LTER) site) and Durham, NC (as part of the NSF Urban Long-Term Research Area (ULTRA) program); these models will be used to simulate nitrogen loading resulting from both baseline residential rooftop impervious connectivity and for disconnection scenarios (e.g. roof drainage to lawn v. engineered rain garden, upslope v. riparian). This research builds on work done as part of the NSF EarthCube Layered Architecture Concept Award where a RHESSys workflow is being implemented in an iRODS (integrated Rule-Oriented Data System) environment. Modeling the ecohydrology of urban ecosystems in a reliable and reproducible manner requires a flexible scientific workflow platform that allows rapid prototyping with large-scale spatial datasets and model refinement integrating expert knowledge with local datasets and household surveys.
Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi
NASA Astrophysics Data System (ADS)
Erdmann, M.; Fischer, B.; Fischer, R.; Rieger, M.
2017-10-01
In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In addition to already built-in features for remote jobs and file systems like Hadoop and HDFS, we added support for WLCG infrastructure such as LSF and CREAM job submission, as well as remote file access through the Grid File Access Library. Furthermore, we implemented automated resubmission functionality, software sandboxing, and a command line interface with auto-completion for a convenient working environment. For the implementation of a t \\overline{{{t}}} H cross section measurement, we created a generic Python interface that provides programmatic access to all external information such as datasets, physics processes, statistical models, and additional files and values. In summary, the setup enables the execution of the entire analysis in a parallelized and distributed fashion with a single command.
Rogers, M; Zach, L; An, Y; Dalrymple, P
2012-01-01
This paper reports on work carried out to elicit information needs at a trans-disciplinary, nurse-managed health care clinic that serves a medically disadvantaged urban population. The trans-disciplinary model provides a "one-stop shop" for patients who can receive a wide range of services beyond traditional primary care. However, this model of health care presents knowledge sharing challenges because little is known about how data collected from the non-traditional services can be integrated into the traditional electronic medical record (EMR) and shared with other care providers. There is also little known about how health information technology (HIT) can be used to support the workflow in such a practice. The objective of this case study was to identify the information needs of care providers in order to inform the design of HIT to support knowledge sharing and distributed decision making. A participatory design approach is presented as a successful technique to specify requirements for HIT applications that can support a trans-disciplinary model of care. Using this design approach, the researchers identified the information needs of care providers working at the clinic and suggested HIT improvements to integrate non-traditional information into the EMR. These modifications allow knowledge sharing among care providers and support better health decisions. We have identified information needs of care providers as they are relevant to the design of health information systems. As new technology is designed and integrated into various workflows it is clear that understanding information needs is crucial to acceptance of that technology.
Systematic Redaction for Neuroimage Data
Matlock, Matt; Schimke, Nakeisha; Kong, Liang; Macke, Stephen; Hale, John
2013-01-01
In neuroscience, collaboration and data sharing are undermined by concerns over the management of protected health information (PHI) and personal identifying information (PII) in neuroimage datasets. The HIPAA Privacy Rule mandates measures for the preservation of subject privacy in neuroimaging studies. Unfortunately for the researcher, the management of information privacy is a burdensome task. Wide scale data sharing of neuroimages is challenging for three primary reasons: (i) A dearth of tools to systematically expunge PHI/PII from neuroimage data sets, (ii) a facility for tracking patient identities in redacted datasets has not been produced, and (iii) a sanitization workflow remains conspicuously absent. This article describes the XNAT Redaction Toolkit—an integrated redaction workflow which extends a popular neuroimage data management toolkit to remove PHI/PII from neuroimages. Quickshear defacing is also presented as a complementary technique for deidentifying the image data itself. Together, these tools improve subject privacy through systematic removal of PII/PHI. PMID:24179597
Software Project Management and Measurement on the World-Wide-Web (WWW)
NASA Technical Reports Server (NTRS)
Callahan, John; Ramakrishnan, Sudhaka
1996-01-01
We briefly describe a system for forms-based, work-flow management that helps members of a software development team overcome geographical barriers to collaboration. Our system, called the Web Integrated Software Environment (WISE), is implemented as a World-Wide-Web service that allows for management and measurement of software development projects based on dynamic analysis of change activity in the workflow. WISE tracks issues in a software development process, provides informal communication between the users with different roles, supports to-do lists, and helps in software process improvement. WISE minimizes the time devoted to metrics collection and analysis by providing implicit delivery of messages between users based on the content of project documents. The use of a database in WISE is hidden from the users who view WISE as maintaining a personal 'to-do list' of tasks related to the many projects on which they may play different roles.
[Implementation of modern operating room management -- experiences made at an university hospital].
Hensel, M; Wauer, H; Bloch, A; Volk, T; Kox, W J; Spies, C
2005-07-01
Caused by structural changes in health care the general need for cost control is evident for all hospitals. As operating room is one of the most cost-intensive sectors in a hospital, optimisation of workflow processes in this area is of particular interest for health care providers. While modern operating room management is established in several clinics yet, others are less prepared for economic challenges. Therefore, the operating room statute of the Charité university hospital useful for other hospitals to develop an own concept is presented. In addition, experiences made with implementation of new management structures are described and results obtained over the last 5 years are reported. Whereas the total number of operation procedures increased by 15 %, the operating room utilization increased more markedly in terms of time and cases. Summarizing the results, central operating room management has been proved to be an effective tool to increase the efficiency of workflow processes in the operating room.
Distributed Data Integration Infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Critchlow, T; Ludaescher, B; Vouk, M
The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, usemore » them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future.« less
Meeker, Daniella; Jiang, Xiaoqian; Matheny, Michael E; Farcas, Claudiu; D'Arcy, Michel; Pearlman, Laura; Nookala, Lavanya; Day, Michele E; Kim, Katherine K; Kim, Hyeoneui; Boxwala, Aziz; El-Kareh, Robert; Kuo, Grace M; Resnic, Frederic S; Kesselman, Carl; Ohno-Machado, Lucila
2015-11-01
Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
The “Common Solutions” Strategy of the Experiment Support group at CERN for the LHC Experiments
NASA Astrophysics Data System (ADS)
Girone, M.; Andreeva, J.; Barreiro Megino, F. H.; Campana, S.; Cinquilli, M.; Di Girolamo, A.; Dimou, M.; Giordano, D.; Karavakis, E.; Kenyon, M. J.; Kokozkiewicz, L.; Lanciotti, E.; Litmaath, M.; Magini, N.; Negri, G.; Roiser, S.; Saiz, P.; Saiz Santos, M. D.; Schovancova, J.; Sciabà, A.; Spiga, D.; Trentadue, R.; Tuckett, D.; Valassi, A.; Van der Ster, D. C.; Shiers, J. D.
2012-12-01
After two years of LHC data taking, processing and analysis and with numerous changes in computing technology, a number of aspects of the experiments’ computing, as well as WLCG deployment and operations, need to evolve. As part of the activities of the Experiment Support group in CERN's IT department, and reinforced by effort from the EGI-InSPIRE project, we present work aimed at common solutions across all LHC experiments. Such solutions allow us not only to optimize development manpower but also offer lower long-term maintenance and support costs. The main areas cover Distributed Data Management, Data Analysis, Monitoring and the LCG Persistency Framework. Specific tools have been developed including the HammerCloud framework, automated services for data placement, data cleaning and data integrity (such as the data popularity service for CMS, the common Victor cleaning agent for ATLAS and CMS and tools for catalogue/storage consistency), the Dashboard Monitoring framework (job monitoring, data management monitoring, File Transfer monitoring) and the Site Status Board. This talk focuses primarily on the strategic aspects of providing such common solutions and how this relates to the overall goals of long-term sustainability and the relationship to the various WLCG Technical Evolution Groups. The success of the service components has given us confidence in the process, and has developed the trust of the stakeholders. We are now attempting to expand the development of common solutions into the more critical workflows. The first is a feasibility study of common analysis workflow execution elements between ATLAS and CMS. We look forward to additional common development in the future.
NASA Astrophysics Data System (ADS)
Zhang, Fan; Zhou, Zude; Liu, Quan; Xu, Wenjun
2017-02-01
Due to the advantages of being able to function under harsh environmental conditions and serving as a distributed condition information source in a networked monitoring system, the fibre Bragg grating (FBG) sensor network has attracted considerable attention for equipment online condition monitoring. To provide an overall conditional view of the mechanical equipment operation, a networked service-oriented condition monitoring framework based on FBG sensing is proposed, together with an intelligent matching method for supporting monitoring service management. In the novel framework, three classes of progressive service matching approaches, including service-chain knowledge database service matching, multi-objective constrained service matching and workflow-driven human-interactive service matching, are developed and integrated with an enhanced particle swarm optimisation (PSO) algorithm as well as a workflow-driven mechanism. Moreover, the manufacturing domain ontology, FBG sensor network structure and monitoring object are considered to facilitate the automatic matching of condition monitoring services to overcome the limitations of traditional service processing methods. The experimental results demonstrate that FBG monitoring services can be selected intelligently, and the developed condition monitoring system can be re-built rapidly as new equipment joins the framework. The effectiveness of the service matching method is also verified by implementing a prototype system together with its performance analysis.
Accelerating Cancer Systems Biology Research through Semantic Web Technology
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S.
2012-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute’s caBIG®, so users can not only interact with the DMR through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers’ intellectual property. PMID:23188758
Accelerating cancer systems biology research through Semantic Web technology.
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S
2013-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute's caBIG, so users can interact with the DMR not only through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers' intellectual property. Copyright © 2012 Wiley Periodicals, Inc.
Designing integrated computational biology pipelines visually.
Jamil, Hasan M
2013-01-01
The long-term cost of developing and maintaining a computational pipeline that depends upon data integration and sophisticated workflow logic is too high to even contemplate "what if" or ad hoc type queries. In this paper, we introduce a novel application building interface for computational biology research, called VizBuilder, by leveraging a recent query language called BioFlow for life sciences databases. Using VizBuilder, it is now possible to develop ad hoc complex computational biology applications at throw away costs. The underlying query language supports data integration and workflow construction almost transparently and fully automatically, using a best effort approach. Users express their application by drawing it with VizBuilder icons and connecting them in a meaningful way. Completed applications are compiled and translated as BioFlow queries for execution by the data management system LifeDB, for which VizBuilder serves as a front end. We discuss VizBuilder features and functionalities in the context of a real life application after we briefly introduce BioFlow. The architecture and design principles of VizBuilder are also discussed. Finally, we outline future extensions of VizBuilder. To our knowledge, VizBuilder is a unique system that allows visually designing computational biology pipelines involving distributed and heterogeneous resources in an ad hoc manner.
Task–Technology Fit of Video Telehealth for Nurses in an Outpatient Clinic Setting
Finkelstein, Stanley M.
2014-01-01
Abstract Background: Incorporating telehealth into outpatient care delivery supports management of consumer health between clinic visits. Task–technology fit is a framework for understanding how technology helps and/or hinders a person during work processes. Evaluating the task–technology fit of video telehealth for personnel working in a pediatric outpatient clinic and providing care between clinic visits ensures the information provided matches the information needed to support work processes. Materials and Methods: The workflow of advanced practice registered nurse (APRN) care coordination provided via telephone and video telehealth was described and measured using a mixed-methods workflow analysis protocol that incorporated cognitive ethnography and time–motion study. Qualitative and quantitative results were merged and analyzed within the task–technology fit framework to determine the workflow fit of video telehealth for APRN care coordination. Results: Incorporating video telehealth into APRN care coordination workflow provided visual information unavailable during telephone interactions. Despite additional tasks and interactions needed to obtain the visual information, APRN workflow efficiency, as measured by time, was not significantly changed. Analyzed within the task–technology fit framework, the increased visual information afforded by video telehealth supported the assessment and diagnostic information needs of the APRN. Conclusions: Telehealth must provide the right information to the right clinician at the right time. Evaluating task–technology fit using a mixed-methods protocol ensured rigorous analysis of fit within work processes and identified workflows that benefit most from the technology. PMID:24841219
A Mixed-Methods Research Framework for Healthcare Process Improvement.
Bastian, Nathaniel D; Munoz, David; Ventura, Marta
2016-01-01
The healthcare system in the United States is spiraling out of control due to ever-increasing costs without significant improvements in quality, access to care, satisfaction, and efficiency. Efficient workflow is paramount to improving healthcare value while maintaining the utmost standards of patient care and provider satisfaction in high stress environments. This article provides healthcare managers and quality engineers with a practical healthcare process improvement framework to assess, measure and improve clinical workflow processes. The proposed mixed-methods research framework integrates qualitative and quantitative tools to foster the improvement of processes and workflow in a systematic way. The framework consists of three distinct phases: 1) stakeholder analysis, 2a) survey design, 2b) time-motion study, and 3) process improvement. The proposed framework is applied to the pediatric intensive care unit of the Penn State Hershey Children's Hospital. The implementation of this methodology led to identification and categorization of different workflow tasks and activities into both value-added and non-value added in an effort to provide more valuable and higher quality patient care. Based upon the lessons learned from the case study, the three-phase methodology provides a better, broader, leaner, and holistic assessment of clinical workflow. The proposed framework can be implemented in various healthcare settings to support continuous improvement efforts in which complexity is a daily element that impacts workflow. We proffer a general methodology for process improvement in a healthcare setting, providing decision makers and stakeholders with a useful framework to help their organizations improve efficiency. Published by Elsevier Inc.
Beck, Marcus W.; Vondracek, Bruce C.; Hatch, Lorin K.; Vinje, Jason
2013-01-01
Lake resources can be negatively affected by environmental stressors originating from multiple sources and different spatial scales. Shoreline development, in particular, can negatively affect lake resources through decline in habitat quality, physical disturbance, and impacts on fisheries. The development of remote sensing techniques that efficiently characterize shoreline development in a regional context could greatly improve management approaches for protecting and restoring lake resources. The goal of this study was to develop an approach using high-resolution aerial photographs to quantify and assess docks as indicators of shoreline development. First, we describe a dock analysis workflow that can be used to quantify the spatial extent of docks using aerial images. Our approach incorporates pixel-based classifiers with object-based techniques to effectively analyze high-resolution digital imagery. Second, we apply the analysis workflow to quantify docks for 4261 lakes managed by the Minnesota Department of Natural Resources. Overall accuracy of the analysis results was 98.4% (87.7% based on ) after manual post-processing. The analysis workflow was also 74% more efficient than the time required for manual digitization of docks. These analyses have immediate relevance for resource planning in Minnesota, whereas the dock analysis workflow could be used to quantify shoreline development in other regions with comparable imagery. These data can also be used to better understand the effects of shoreline development on aquatic resources and to evaluate the effects of shoreline development relative to other stressors.
Analysis, Mining and Visualization Service at NCSA
NASA Astrophysics Data System (ADS)
Wilhelmson, R.; Cox, D.; Welge, M.
2004-12-01
NCSA's goal is to create a balanced system that fully supports high-end computing as well as: 1) high-end data management and analysis; 2) visualization of massive, highly complex data collections; 3) large databases; 4) geographically distributed Grid computing; and 5) collaboratories, all based on a secure computational environment and driven with workflow-based services. To this end NCSA has defined a new technology path that includes the integration and provision of cyberservices in support of data analysis, mining, and visualization. NCSA has begun to develop and apply a data mining system-NCSA Data-to-Knowledge (D2K)-in conjunction with both the application and research communities. NCSA D2K will enable the formation of model-based application workflows and visual programming interfaces for rapid data analysis. The Java-based D2K framework, which integrates analytical data mining methods with data management, data transformation, and information visualization tools, will be configurable from the cyberservices (web and grid services, tools, ..) viewpoint to solve a wide range of important data mining problems. This effort will use modules, such as a new classification methods for the detection of high-risk geoscience events, and existing D2K data management, machine learning, and information visualization modules. A D2K cyberservices interface will be developed to seamlessly connect client applications with remote back-end D2K servers, providing computational resources for data mining and integration with local or remote data stores. This work is being coordinated with SDSC's data and services efforts. The new NCSA Visualization embedded workflow environment (NVIEW) will be integrated with D2K functionality to tightly couple informatics and scientific visualization with the data analysis and management services. Visualization services will access and filter disparate data sources, simplifying tasks such as fusing related data from distinct sources into a coherent visual representation. This approach enables collaboration among geographically dispersed researchers via portals and front-end clients, and the coupling with data management services enables recording associations among datasets and building annotation systems into visualization tools and portals, giving scientists a persistent, shareable, virtual lab notebook. To facilitate provision of these cyberservices to the national community, NCSA will be providing a computational environment for large-scale data assimilation, analysis, mining, and visualization. This will be initially implemented on the new 512 processor shared memory SGI's recently purchased by NCSA. In addition to standard batch capabilities, NCSA will provide on-demand capabilities for those projects requiring rapid response (e.g., development of severe weather, earthquake events) for decision makers. It will also be used for non-sequential interactive analysis of data sets where it is important have access to large data volumes over space and time.
[Applications of the hospital statistics management system].
Zhai, Hong; Ren, Yong; Liu, Jing; Li, You-Zhang; Ma, Xiao-Long; Jiao, Tao-Tao
2008-01-01
The Hospital Statistics Management System is built on an Office Automation Platform of Shandong provincial hospital system. Its workflow, role and popedom technologies are used to standardize and optimize the management program of statistics in the total quality control of hospital statistics. The system's applications have combined the office automation platform with the statistics management in a hospital and this provides a practical example of a modern hospital statistics management model.
NASA Langley Atmospheric Science Data Center (ASDC) Experience with Aircraft Data
NASA Astrophysics Data System (ADS)
Perez, J.; Sorlie, S.; Parker, L.; Mason, K. L.; Rinsland, P.; Kusterer, J.
2011-12-01
Over the past decade the NASA Langley ASDC has archived and distributed a variety of aircraft mission data sets. These datasets posed unique challenges for archiving from the rigidity of the archiving system and formats to the lack of metadata. The ASDC developed a state-of-the-art data archive and distribution system to serve the atmospheric sciences data provider and researcher communities. The system, called Archive - Next Generation (ANGe), is designed with a distributed, multi-tier, serviced-based, message oriented architecture enabling new methods for searching, accessing, and customizing data. The ANGe system provides the ease and flexibility to ingest and archive aircraft data through an ad hoc workflow or to develop a new workflow to suit the providers needs. The ASDC will describe the challenges encountered in preparing aircraft data for archiving and distribution. The ASDC is currently providing guidance to the DISCOVER-AQ (Deriving Information on Surface Conditions from Column and Vertically Resolved Observations Relevant to Air Quality) Earth Venture-1 project on developing collection, granule, and browse metadata as well as supporting the ADAM (Airborne Data For Assessing Models) site.
2013-01-01
Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results. PMID:23368995
Distributed digital music archives and libraries
NASA Astrophysics Data System (ADS)
Fujinaga, Ichiro
2005-09-01
The main goal of this research program is to develop and evaluate practices, frameworks, and tools for the design and construction of worldwide distributed digital music archives and libraries. Over the last few millennia, humans have amassed an enormous amount of musical information that is scattered around the world. It is becoming abundantly clear that the optimal path for acquisition is to distribute the task of digitizing the wealth of historical and cultural heritage material that exists in analogue formats, which may include books and manuscripts related to music, music scores, photographs, videos, audio tapes, and phonograph records. In order to achieve this goal, libraries, museums, and archives throughout the world, large or small, need well-researched policies, proper guidance, and efficient tools to digitize their collections and to make them available economically. The research conducted within the program addresses unique and imminent challenges posed by the digitization and dissemination of music media. The are four major research projects in progress: development and evaluation of digitization methods for preservation of analogue recordings; optical music recognition using microfilms; design of workflow management system with automatic metadata extraction; and formulation of interlibrary communication strategies.
Leveraging workflow control patterns in the domain of clinical practice guidelines.
Kaiser, Katharina; Marcos, Mar
2016-02-10
Clinical practice guidelines (CPGs) include recommendations describing appropriate care for the management of patients with a specific clinical condition. A number of representation languages have been developed to support executable CPGs, with associated authoring/editing tools. Even with tool assistance, authoring of CPG models is a labor-intensive task. We aim at facilitating the early stages of CPG modeling task. In this context, we propose to support the authoring of CPG models based on a set of suitable procedural patterns described in an implementation-independent notation that can be then semi-automatically transformed into one of the alternative executable CPG languages. We have started with the workflow control patterns which have been identified in the fields of workflow systems and business process management. We have analyzed the suitability of these patterns by means of a qualitative analysis of CPG texts. Following our analysis we have implemented a selection of workflow patterns in the Asbru and PROforma CPG languages. As implementation-independent notation for the description of patterns we have chosen BPMN 2.0. Finally, we have developed XSLT transformations to convert the BPMN 2.0 version of the patterns into the Asbru and PROforma languages. We showed that although a significant number of workflow control patterns are suitable to describe CPG procedural knowledge, not all of them are applicable in the context of CPGs due to their focus on single-patient care. Moreover, CPGs may require additional patterns not included in the set of workflow control patterns. We also showed that nearly all the CPG-suitable patterns can be conveniently implemented in the Asbru and PROforma languages. Finally, we demonstrated that individual patterns can be semi-automatically transformed from a process specification in BPMN 2.0 to executable implementations in these languages. We propose a pattern and transformation-based approach for the development of CPG models. Such an approach can form the basis of a valid framework for the authoring of CPG models. The identification of adequate patterns and the implementation of transformations to convert patterns from a process specification into different executable implementations are the first necessary steps for our approach.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Model medication management process in Australian nursing homes using business process modeling.
Qian, Siyu; Yu, Ping
2013-01-01
One of the reasons for end user avoidance or rejection to use health information systems is poor alignment of the system with healthcare workflow, likely causing by system designers' lack of thorough understanding about healthcare process. Therefore, understanding the healthcare workflow is the essential first step for the design of optimal technologies that will enable care staff to complete the intended tasks faster and better. The often use of multiple or "high risk" medicines by older people in nursing homes has the potential to increase medication error rate. To facilitate the design of information systems with most potential to improve patient safety, this study aims to understand medication management process in nursing homes using business process modeling method. The paper presents study design and preliminary findings from interviewing two registered nurses, who were team leaders in two nursing homes. Although there were subtle differences in medication management between the two homes, major medication management activities were similar. Further field observation will be conducted. Based on the data collected from observations, an as-is process model for medication management will be developed.
ERIC Educational Resources Information Center
Murray, Adam
2008-01-01
Designed to assist with the management of e-resources, electronic resource management (ERM) systems are time- and fund-consuming to purchase and maintain. Questions of system compatibility, data population, and workflow design/redesign can be difficult to answer; sometimes those answers are not what we'd prefer to hear. The two primary functions…
NASA Astrophysics Data System (ADS)
Verkaik, J.
2013-12-01
The Netherlands Hydrological Instrument (NHI) model predicts water demands in periods of drought, supporting the Dutch decision makers in taking operational as well as long-term decisions with respect to the water supply. Other applications of NHI are predicting fresh-salt interaction, nutrient loadings, and agriculture change. The NHI model consists of several coupled models: a saturated groundwater model (MODFLOW), an unsaturated groundwater model (MetaSWAP), a sub-catchment surface water model (MOZART), and a distribution network of surface waters model (DM/SOBEK). Each of these models requires specific, usually large, input data that may be the result of sophisticated schematization workflows. Input data can also be dependent on each other, for example, the precipitation data is input for the unsaturated zone model (cells) as well as for the surface water models (polygons). For efficient data management, we developed several Python tools such that the modeler or stakeholder can use the model in a user-friendly manner, and data is managed in a consistent, transparent and reproducible way. Two open source Python tools are presented here: the data version control module for the workflow manager VisTrails called FileSync, and the NHI model control script that uses FileSync. VisTrails is an open-source scientific workflow and provenance management system that provides support for simulations, data exploration and visualization. Since VisTrails does not directly support version control we developed a version control module called FileSync. With this generic module, the user can synchronize data from and to his workflow through a dialog window. The FileSync dialog calls the FileSync script that is command-line based and performs the actual data synchronization. This script allows the user to easily create a model repository, upload and download data, create releases and define scenarios. The data synchronization approach applied here differs from systems as Subversion or Git, since these systems do not perform well for large (binary) model data files. For this reason, a new concept of parameterization and data splitting has been implemented. Each file, or set of files, is uniquely labeled as a parameter, and for this parameter metadata is maintained by Subversion. The metadata data contains file hashes to identify data content and the location where the actual bulk data are stored that can be reached by FTP. The NHI model control script is a command-line driven Python script for pre-processing, running, and post-processing the NHI model and uses one single configuration file for all computational kernels. This configuration file is an easy-to-use, keyword-driven, Windows INI-file, having separate sections for all the kernels. It also includes a FileSync data section where the user can specify version controlled model data to be used as input. The NHI control script keeps all the data consistent during the pre-processing. Furthermore, this script is able to do model state handling when the NHI model is used for ensemble forecasting.
Performance Studies on Distributed Virtual Screening
Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.
2014-01-01
Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Overcoming the Challenges of Implementing a Multi-Mission Distributed Workflow System
NASA Technical Reports Server (NTRS)
Sayfi, Elias; Cheng, Cecilia; Lee, Hyun; Patel, Rajesh; Takagi, Atsuya; Yu, Dan
2009-01-01
A multi-mission approach to solving the same problems for various projects is enticing. However, the multi-mission approach leads to the need to develop a configurable, adaptable and distributed system to meet unique project requirements. That, in turn, leads to a set of challenges varying from handling synchronization issues to coming up with a smart design that allows the "unknowns" to be decided later. This paper discusses the challenges that the Multi-mission Automated Task Invocation Subsystem (MATIS) team has come up against while designing the distributed workflow system, as well as elaborates on the solutions that were implemented. The first is to design an easily adaptable system that requires no code changes as a result of configuration changes. The number of formal deliveries is often limited because each delivery costs time and money. Changes such as the sequence of programs being called, a change of a parameter value in the program that is being automated should not result in code changes or redelivery.
NASA Astrophysics Data System (ADS)
Gustafsson, C.; Nordström, F.; Persson, E.; Brynolfsson, J.; Olsson, L. E.
2017-04-01
Dosimetric errors in a magnetic resonance imaging (MRI) only radiotherapy workflow may be caused by system specific geometric distortion from MRI. The aim of this study was to evaluate the impact on planned dose distribution and delineated structures for prostate patients, originating from this distortion. A method was developed, in which computer tomography (CT) images were distorted using the MRI distortion field. The displacement map for an optimized MRI treatment planning sequence was measured using a dedicated phantom in a 3 T MRI system. To simulate the distortion aspects of a synthetic CT (electron density derived from MR images), the displacement map was applied to CT images, referred to as distorted CT images. A volumetric modulated arc prostate treatment plan was applied to the original CT and the distorted CT, creating a reference and a distorted CT dose distribution. By applying the inverse of the displacement map to the distorted CT dose distribution, a dose distribution in the same geometry as the original CT images was created. For 10 prostate cancer patients, the dose difference between the reference dose distribution and inverse distorted CT dose distribution was analyzed in isodose level bins. The mean magnitude of the geometric distortion was 1.97 mm for the radial distance of 200-250 mm from isocenter. The mean percentage dose differences for all isodose level bins, were ⩽0.02% and the radiotherapy structure mean volume deviations were <0.2%. The method developed can quantify the dosimetric effects of MRI system specific distortion in a prostate MRI only radiotherapy workflow, separated from dosimetric effects originating from synthetic CT generation. No clinically relevant dose difference or structure deformation was found when 3D distortion correction and high acquisition bandwidth was used. The method could be used for any MRI sequence together with any anatomy of interest.
Gustafsson, C; Nordström, F; Persson, E; Brynolfsson, J; Olsson, L E
2017-04-21
Dosimetric errors in a magnetic resonance imaging (MRI) only radiotherapy workflow may be caused by system specific geometric distortion from MRI. The aim of this study was to evaluate the impact on planned dose distribution and delineated structures for prostate patients, originating from this distortion. A method was developed, in which computer tomography (CT) images were distorted using the MRI distortion field. The displacement map for an optimized MRI treatment planning sequence was measured using a dedicated phantom in a 3 T MRI system. To simulate the distortion aspects of a synthetic CT (electron density derived from MR images), the displacement map was applied to CT images, referred to as distorted CT images. A volumetric modulated arc prostate treatment plan was applied to the original CT and the distorted CT, creating a reference and a distorted CT dose distribution. By applying the inverse of the displacement map to the distorted CT dose distribution, a dose distribution in the same geometry as the original CT images was created. For 10 prostate cancer patients, the dose difference between the reference dose distribution and inverse distorted CT dose distribution was analyzed in isodose level bins. The mean magnitude of the geometric distortion was 1.97 mm for the radial distance of 200-250 mm from isocenter. The mean percentage dose differences for all isodose level bins, were ⩽0.02% and the radiotherapy structure mean volume deviations were <0.2%. The method developed can quantify the dosimetric effects of MRI system specific distortion in a prostate MRI only radiotherapy workflow, separated from dosimetric effects originating from synthetic CT generation. No clinically relevant dose difference or structure deformation was found when 3D distortion correction and high acquisition bandwidth was used. The method could be used for any MRI sequence together with any anatomy of interest.
Autonomic Management of Application Workflows on Hybrid Computing Infrastructure
Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...
2011-01-01
In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less
Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.
Stockton, David B; Santamaria, Fidel
2017-10-01
We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
Efficient Workflows for Curation of Heterogeneous Data Supporting Modeling of U-Nb Alloy Aging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ward, Logan Timothy; Hackenberg, Robert Errol
These are slides from a presentation summarizing a graduate research associate's summer project. The following topics are covered in these slides: data challenges in materials, aging in U-Nb Alloys, Building an Aging Model, Different Phase Trans. in U-Nb, the Challenge, Storing Materials Data, Example Data Source, Organizing Data: What is a Schema?, What does a "XML Schema" look like?, Our Data Schema: Nice and Simple, Storing Data: Materials Data Curation System (MDCS), Problem with MDCS: Slow Data Entry, Getting Literature into MDCS, Staging Data in Excel Document, Final Result: MDCS Records, Analyzing Image Data, Process for Making TTT Diagram, Bottleneckmore » Number 1: Image Analysis, Fitting a TTP Boundary, Fitting a TTP Curve: Comparable Results, How Does it Compare to Our Data?, Image Analysis Workflow, Curating Hardness Records, Hardness Data: Two Key Decisions, Before Peak Age? - Automation, Interactive Viz, Which Transformation?, Microstructure-Informed Model, Tracking the Entire Process, General Problem with Property Models, Pinyon: Toolkit for Managing Model Creation, Tracking Individual Decisions, Jupyter: Docs and Code in One File, Hardness Analysis Workflow, Workflow for Aging Models, and conclusions.« less
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.
Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Developing a National-Level Concept Dictionary for EHR Implementations in Kenya.
Keny, Aggrey; Wanyee, Steven; Kwaro, Daniel; Mulwa, Edwin; Were, Martin C
2015-01-01
The increasing adoption of Electronic Health Records (EHR) by developing countries comes with the need to develop common terminology standards to assure semantic interoperability. In Kenya, where the Ministry of Health has rolled out an EHR at 646 sites, several challenges have emerged including variable dictionaries across implementations, inability to easily share data across systems, lack of expertise in dictionary management, lack of central coordination and custody of a terminology service, inadequately defined policies and processes, insufficient infrastructure, among others. A Concept Working Group was constituted to address these challenges. The country settled on a common Kenya data dictionary, initially derived as a subset of the Columbia International eHealth Laboratory (CIEL)/Millennium Villages Project (MVP) dictionary. The initial dictionary scope largely focuses on clinical needs. Processes and policies around dictionary management are being guided by the framework developed by Bakhshi-Raiez et al. Technical and infrastructure-based approaches are also underway to streamline workflow for dictionary management and distribution across implementations. Kenya's approach on comprehensive common dictionary can serve as a model for other countries in similar settings.
Rapid Assessment of Contaminants and Interferences in Mass Spectrometry Data Using Skyline
NASA Astrophysics Data System (ADS)
Rardin, Matthew J.
2018-04-01
Proper sample preparation in proteomic workflows is essential to the success of modern mass spectrometry experiments. Complex workflows often require reagents which are incompatible with MS analysis (e.g., detergents) necessitating a variety of sample cleanup procedures. Efforts to understand and mitigate sample contamination are a continual source of disruption with respect to both time and resources. To improve the ability to rapidly assess sample contamination from a diverse array of sources, I developed a molecular library in Skyline for rapid extraction of contaminant precursor signals using MS1 filtering. This contaminant template library is easily managed and can be modified for a diverse array of mass spectrometry sample preparation workflows. Utilization of this template allows rapid assessment of sample integrity and indicates potential sources of contamination. [Figure not available: see fulltext.
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.
Aranguren, Mikel Egaña; Wilkinson, Mark D
2015-01-01
Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services. This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration. The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
Riposan, Adina; Taylor, Ian; Owens, David R; Rana, Omer; Conley, Edward C
2007-01-01
In this paper we present mechanisms for imaging and spectral data discovery, as applied to the early detection of pathologic mechanisms underlying diabetic retinopathy in research and clinical trial scenarios. We discuss the Alchemist framework, built using a generic peer-to-peer architecture, supporting distributed database queries and complex search algorithms based on workflow. The Alchemist is a domain-independent search mechanism that can be applied to search and data discovery scenarios in many areas. We illustrate Alchemist's ability to perform complex searches composed as a collection of peer-to-peer overlays, Grid-based services and workflows, e.g. applied to image and spectral data discovery, as applied to the early detection and prevention of retinal disease and investigational drug discovery. The Alchemist framework is built on top of decentralised technologies and uses industry standards such as Web services and SOAP for messaging.
Vahabzadeh, Massoud; Lin, Jia-Ling; Mezghanni, Mustapha; Epstein, David H.; Preston, Kenzie L.
2009-01-01
Issues A challenge in treatment research is the necessity of adhering to protocol and regulatory strictures while maintaining flexibility to meet patients’ treatment needs and accommodate variations among protocols. Another challenge is the acquisition of large amounts of data in an occasionally hectic environment, along with provision of seamless methods for exporting, mining, and querying the data. Approach We have automated several major functions of our outpatient treatment research clinic for studies in drug abuse and dependence. Here we describe three such specialized applications: the Automated Contingency Management (ACM) system for delivery of behavioral interventions, the Transactional Electronic Diary (TED) system for management of behavioral assessments, and the Protocol Workflow System (PWS) for computerized workflow automation and guidance of each participant’s daily clinic activities. These modules are integrated into our larger information system to enable data sharing in real time among authorized staff. Key Findings ACM and TED have each permitted us to conduct research that was not previously possible. In addition, the time to data analysis at the end of each study is substantially shorter. With the implementation of the PWS, we have been able to manage a research clinic with an 80-patient capacity having an annual average of 18,000 patient-visits and 7,300 urine collections with a research staff of five. Finally, automated data management has considerably enhanced our ability to monitor and summarize participant-safety data for research oversight. Implications and conclusion When developed in consultation with end users, automation in treatment-research clinics can enable more efficient operations, better communication among staff, and expansions in research methods. PMID:19320669
Pohjonen, Hanna; Ross, Peeter; Blickman, Johan G; Kamman, Richard
2007-01-01
Emerging technologies are transforming the workflows in healthcare enterprises. Computing grids and handheld mobile/wireless devices are providing clinicians with enterprise-wide access to all patient data and analysis tools on a pervasive basis. In this paper, emerging technologies are presented that provide computing grids and streaming-based access to image and data management functions, and system architectures that enable pervasive computing on a cost-effective basis. Finally, the implications of such technologies are investigated regarding the positive impacts on clinical workflows.
Workflow-Based Software Development Environment
NASA Technical Reports Server (NTRS)
Izygon, Michel E.
2013-01-01
The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment
[Development of a medical equipment support information system based on PDF portable document].
Cheng, Jiangbo; Wang, Weidong
2010-07-01
According to the organizational structure and management system of the hospital medical engineering support, integrate medical engineering support workflow to ensure the medical engineering data effectively, accurately and comprehensively collected and kept in electronic archives. Analyse workflow of the medical, equipment support work and record all work processes by the portable electronic document. Using XML middleware technology and SQL Server database, complete process management, data calculation, submission, storage and other functions. The practical application shows that the medical equipment support information system optimizes the existing work process, standardized and digital, automatic and efficient orderly and controllable. The medical equipment support information system based on portable electronic document can effectively optimize and improve hospital medical engineering support work, improve performance, reduce costs, and provide full and accurate digital data
Humphrey, Clinton D; Tollefson, Travis T; Kriet, J David
2010-05-01
Facial plastic surgeons are accumulating massive digital image databases with the evolution of photodocumentation and widespread adoption of digital photography. Managing and maximizing the utility of these vast data repositories, or digital asset management (DAM), is a persistent challenge. Developing a DAM workflow that incorporates a file naming algorithm and metadata assignment will increase the utility of a surgeon's digital images. Copyright 2010 Elsevier Inc. All rights reserved.
Digital disruption ?syndromes.
Sullivan, Clair; Staib, Andrew
2017-05-18
The digital transformation of hospitals in Australia is occurring rapidly in order to facilitate innovation and improve efficiency. Rapid transformation can cause temporary disruption of hospital workflows and staff as processes are adapted to the new digital workflows. The aim of this paper is to outline various types of digital disruption and some strategies for effective management. A large tertiary university hospital recently underwent a rapid, successful roll-out of an integrated electronic medical record (EMR). We observed this transformation and propose several digital disruption "syndromes" to assist with understanding and management during digital transformation: digital deceleration, digital transparency, digital hypervigilance, data discordance, digital churn and post-digital 'depression'. These 'syndromes' are defined and discussed in detail. Successful management of this temporary digital disruption is important to ensure a successful transition to a digital platform. What is known about this topic? Digital disruption is defined as the changes facilitated by digital technologies that occur at a pace and magnitude that disrupt established ways of value creation, social interactions, doing business and more generally our thinking. Increasing numbers of Australian hospitals are implementing digital solutions to replace traditional paper-based systems for patient care in order to create opportunities for improved care and efficiencies. Such large scale change has the potential to create transient disruption to workflows and staff. Managing this temporary disruption effectively is an important factor in the successful implementation of an EMR. What does this paper add? A large tertiary university hospital recently underwent a successful rapid roll-out of an integrated electronic medical record (EMR) to become Australia's largest digital hospital over a 3-week period. We observed and assisted with the management of several cultural, behavioural and operational forms of digital disruption which lead us to propose some digital disruption 'syndromes'. The definition and management of these 'syndromes' are discussed in detail. What are the implications for practitioners? Minimising the temporary effects of digital disruption in hospitals requires an understanding that these digital 'syndromes' are to be expected and actively managed during large-scale transformation.
Towards PCC for Concurrent and Distributed Systems (Work in Progress)
NASA Technical Reports Server (NTRS)
Henriksen, Anders S.; Filinski, Andrzej
2009-01-01
We outline some conceptual challenges in extending the PCC paradigm to a concurrent and distributed setting, and sketch a generalized notion of module correctness based on viewing communication contracts as economic games. The model supports compositional reasoning about modular systems and is meant to apply not only to certification of executable code, but also of organizational workflows.
Novak, Avrey; Nyflot, Matthew J; Ermoian, Ralph P; Jordan, Loucille E; Sponseller, Patricia A; Kane, Gabrielle M; Ford, Eric C; Zeng, Jing
2016-05-01
Radiation treatment planning involves a complex workflow that has multiple potential points of vulnerability. This study utilizes an incident reporting system to identify the origination and detection points of near-miss errors, in order to guide their departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or applied a near-miss risk index (NMRI) to gauge severity. From 3/2012 to 3/2014, 1897 incidents were analyzed from a departmental incident learning system. All incidents were prospectively reviewed weekly by a multidisciplinary team and assigned a NMRI score ranging from 0 to 4 reflecting potential harm to the patient (no potential harm to potential critical harm). Incidents were classified by point of incident origination and detection based on a 103-step workflow. The individual steps were divided among nine broad workflow categories (patient assessment, imaging for radiation therapy (RT) planning, treatment planning, pretreatment plan review, treatment delivery, on-treatment quality management, post-treatment completion, equipment/software quality management, and other). The average NMRI scores of incidents originating or detected within each broad workflow area were calculated. Additionally, out of 103 individual process steps, 35 were classified as safety barriers, the process steps whose primary function is to catch errors. The safety barriers which most frequently detected incidents were identified and analyzed. Finally, the distance between event origination and detection was explored by grouping events by the number of broad workflow area events passed through before detection, and average NMRI scores were compared. Near-miss incidents most commonly originated within treatment planning (33%). However, the incidents with the highest average NMRI scores originated during imaging for RT planning (NMRI = 2.0, average NMRI of all events = 1.5), specifically during the documentation of patient positioning and localization of the patient. Incidents were most frequently detected during treatment delivery (30%), and incidents identified at this point also had higher severity scores than other workflow areas (NMRI = 1.6). Incidents identified during on-treatment quality management were also more severe (NMRI = 1.7), and the specific process steps of reviewing portal and CBCT images tended to catch highest-severity incidents. On average, safety barriers caught 46% of all incidents, most frequently at physics chart review, therapist's chart check, and the review of portal images; however, most of the incidents that pass through a particular safety barrier are not designed to be capable of being captured at that barrier. Incident learning systems can be used to assess the most common points of error origination and detection in radiation oncology. This can help tailor safety improvement efforts and target the highest impact portions of the workflow. The most severe near-miss events tend to originate during simulation, with the most severe near-miss events detected at the time of patient treatment. Safety barriers can be improved to allow earlier detection of near-miss events.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Novak, Avrey; Nyflot, Matthew J.; Ermoian, Ralph P.
Purpose: Radiation treatment planning involves a complex workflow that has multiple potential points of vulnerability. This study utilizes an incident reporting system to identify the origination and detection points of near-miss errors, in order to guide their departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or applied a near-miss risk index (NMRI) to gauge severity. Methods: From 3/2012 to 3/2014, 1897 incidents were analyzed from a departmental incident learning system. All incidents were prospectively reviewed weekly by a multidisciplinary team and assigned a NMRI score ranging from 0 to 4 reflectingmore » potential harm to the patient (no potential harm to potential critical harm). Incidents were classified by point of incident origination and detection based on a 103-step workflow. The individual steps were divided among nine broad workflow categories (patient assessment, imaging for radiation therapy (RT) planning, treatment planning, pretreatment plan review, treatment delivery, on-treatment quality management, post-treatment completion, equipment/software quality management, and other). The average NMRI scores of incidents originating or detected within each broad workflow area were calculated. Additionally, out of 103 individual process steps, 35 were classified as safety barriers, the process steps whose primary function is to catch errors. The safety barriers which most frequently detected incidents were identified and analyzed. Finally, the distance between event origination and detection was explored by grouping events by the number of broad workflow area events passed through before detection, and average NMRI scores were compared. Results: Near-miss incidents most commonly originated within treatment planning (33%). However, the incidents with the highest average NMRI scores originated during imaging for RT planning (NMRI = 2.0, average NMRI of all events = 1.5), specifically during the documentation of patient positioning and localization of the patient. Incidents were most frequently detected during treatment delivery (30%), and incidents identified at this point also had higher severity scores than other workflow areas (NMRI = 1.6). Incidents identified during on-treatment quality management were also more severe (NMRI = 1.7), and the specific process steps of reviewing portal and CBCT images tended to catch highest-severity incidents. On average, safety barriers caught 46% of all incidents, most frequently at physics chart review, therapist’s chart check, and the review of portal images; however, most of the incidents that pass through a particular safety barrier are not designed to be capable of being captured at that barrier. Conclusions: Incident learning systems can be used to assess the most common points of error origination and detection in radiation oncology. This can help tailor safety improvement efforts and target the highest impact portions of the workflow. The most severe near-miss events tend to originate during simulation, with the most severe near-miss events detected at the time of patient treatment. Safety barriers can be improved to allow earlier detection of near-miss events.« less
Several considerations with respect to the future of digital photography and photographic printing
NASA Astrophysics Data System (ADS)
Tuijn, Chris; Mahy, Marc F.
2000-12-01
Digital cameras are no longer exotic gadgets being used by a privileged group of early adopters. More and more people realize that there are obvious advantages to the digital solution over the conventional film-based workflow. Claiming that prints on paper are no longer necessary in the digit workflow, however, would be similar to reviving the myth of the paperless office. Often, people still like to share their memories on paper and this for a variety of reasons. There are still some hurdles to be taken in order to make the digital dream com true. In this paper, we will give a survey of the different workflows in digital photography. The local, semi-local and Internet solutions will be discussed as well as the preferred output systems for each of these solutions. When discussing output system, we immediately think of appropriate color management solutions. In the second part of this paper, we will discuss the major color management issues appearing in digital photography. A clear separation between the image acquisition and the image rendering phases will be made. After a quick survey of the different image restoration and enhancement techniques, we will make some reflections on the ideal color exchange space; the enhanced image should be delivered in this exchange space and, from there, the standard color management transformations can be applied to transfer the image from this exchange space to the native color space of the output device. We will also discus some color gamut characteristics and color management problems of different types of photographic printers that can occur during this conversion process.
Meta-manager: a requirements analysis.
Cook, J F; Rozenblit, J W; Chacko, A K; Martinez, R; Timboe, H L
1999-05-01
The digital imaging network-picture-archiving and communications system (DIN-PACS) will be implemented in ten sites within the Great Plains Regional Medical Command (GPRMC). This network of PACS and teleradiology technology over a shared T1 network has opened the door for round the clock radiology coverage of all sites. However, the concept of a virtual radiology environment poses new issues for military medicine. A new workflow management system must be developed. This workflow management system will allow us to efficiently resolve these issues including quality of care, availability, severe capitation, and quality of the workforce. The design process of this management system must employ existing technology, operate over various telecommunication networks and protocols, be independent of platform operating systems, be flexible and scaleable, and involve the end user at the outset in the design process for which it is developed. Using the unified modeling language (UML), the specifications for this new business management system were created in concert between the University of Arizona and the GPRMC. These specifications detail a management system operating through a common object request brokered architecture (CORBA) environment. In this presentation, we characterize the Meta-Manager management system including aspects of intelligence, interfacility routing, fail-safe operations, and expected improvements in patient care and efficiency.
Software Defined Cyberinfrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, Ian; Blaiszik, Ben; Chard, Kyle
Within and across thousands of science labs, researchers and students struggle to manage data produced in experiments, simulations, and analyses. Largely manual research data lifecycle management processes mean that much time is wasted, research results are often irreproducible, and data sharing and reuse remain rare. In response, we propose a new approach to data lifecycle management in which researchers are empowered to define the actions to be performed at individual storage systems when data are created or modified: actions such as analysis, transformation, copying, and publication. We term this approach software-defined cyberinfrastructure because users can implement powerful data management policiesmore » by deploying rules to local storage systems, much as software-defined networking allows users to configure networks by deploying rules to switches.We argue that this approach can enable a new class of responsive distributed storage infrastructure that will accelerate research innovation by allowing any researcher to associate data workflows with data sources, whether local or remote, for such purposes as data ingest, characterization, indexing, and sharing. We report on early experiments with this approach in the context of experimental science, in which a simple if-trigger-then-action (IFTA) notation is used to define rules.« less
MO-D-213-01: Workflow Monitoring for a High Volume Radiation Oncology Center
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laub, S; Dunn, M; Galbreath, G
2015-06-15
Purpose: Implement a center wide communication system that increases interdepartmental transparency and accountability while decreasing redundant work and treatment delays by actively monitoring treatment planning workflow. Methods: Intake Management System (IMS), a program developed by ProCure Treatment Centers Inc., is a multi-function database that stores treatment planning process information. It was devised to work with the oncology information system (Mosaiq) to streamline interdepartmental workflow.Each step in the treatment planning process is visually represented and timelines for completion of individual tasks are established within the software. The currently active step of each patient’s planning process is highlighted either red or greenmore » according to whether the initially allocated amount of time has passed for the given process. This information is displayed as a Treatment Planning Process Monitor (TPPM), which is shown on screens in the relevant departments throughout the center. This display also includes the individuals who are responsible for each task.IMS is driven by Mosaiq’s quality checklist (QCL) functionality. Each step in the workflow is initiated by a Mosaiq user sending the responsible party a QCL assignment. IMS is connected to Mosaiq and the sending or completing of a QCL updates the associated field in the TPPM to the appropriate status. Results: Approximately one patient a week is identified during the workflow process as needing to have his/her treatment start date modified or resources re-allocated to address the most urgent cases. Being able to identify a realistic timeline for planning each patient and having multiple departments communicate their limitations and time constraints allows for quality plans to be developed and implemented without overburdening any one department. Conclusion: Monitoring the progression of the treatment planning process has increased transparency between departments, which enables efficient communication. Having built-in timelines allows easy prioritization of tasks and resources and facilitates effective time management.« less
[Change in process management by implementing RIS, PACS and flat-panel detectors].
Imhof, H; Dirisamer, A; Fischer, H; Grampp, S; Heiner, L; Kaderk, M; Krestan, C; Kainberger, F
2002-05-01
Implementation of radiological information systems (RIS) and picture archiving and communicating systems (PACS) results in significant changes of workflow in a radiological department. Additional connection with flat-panel detectors leads to a shortening of the work process. RIS and PACS implementation alone reduces the complete workflow by 21-80%. With flatpanel technology the image production process is further shortened by 25-30%. The workflow-steps are changed from original 17-12 with the implementation of RIS and PACS and to 5 with the integrated use of flatpanels. This clearly recognizable advantages in the workflow need an according financial investment. Several studies could show that the capitalisation-factor calculated over eight years is positive, with a gain range between 5-25%. Whether the additional implementation of flatpanel detectors results also in a positive capitalisation over the years, cannot be estimated exactly, at the moment, because the experiences are too short. Particularly critical are the interfaces, which needs a constant quality control. Our flatpanel detector-system is fixed, special images--as we have them in about 3-5% of all cases--need still conventional filmscreen or phosphorplate-systems. Full-spine and long-leg examinations cannot be performed with sufficient exactness. Without any questions implementation of integrated RIS, PACS and flatpanel detector-system needs excellent training of the employees, because of the changes in workflow etc. The main profits of such an integrated implementation are an increase in quality in image and report datas, easier handling--there are almost no more cassettes necessary--and excessive shortening of workflow.
Thermal Remote Sensing with Uav-Based Workflows
NASA Astrophysics Data System (ADS)
Boesch, R.
2017-08-01
Climate change will have a significant influence on vegetation health and growth. Predictions of higher mean summer temperatures and prolonged summer draughts may pose a threat to agriculture areas and forest canopies. Rising canopy temperatures can be an indicator of plant stress because of the closure of stomata and a decrease in the transpiration rate. Thermal cameras are available for decades, but still often used for single image analysis, only in oblique view manner or with visual evaluations of video sequences. Therefore remote sensing using a thermal camera can be an important data source to understand transpiration processes. Photogrammetric workflows allow to process thermal images similar to RGB data. But low spatial resolution of thermal cameras, significant optical distortion and typically low contrast require an adapted workflow. Temperature distribution in forest canopies is typically completely unknown and less distinct than for urban or industrial areas, where metal constructions and surfaces yield high contrast and sharp edge information. The aim of this paper is to investigate the influence of interior camera orientation, tie point matching and ground control points on the resulting accuracy of bundle adjustment and dense cloud generation with a typically used photogrammetric workflow for UAVbased thermal imagery in natural environments.
An automated workflow for parallel processing of large multiview SPIM recordings
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-01-01
Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585
An automated workflow for parallel processing of large multiview SPIM recordings.
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-04-01
Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction : schmied@mpi-cbg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Automatic Image Processing Workflow for the Keck/NIRC2 Vortex Coronagraph
NASA Astrophysics Data System (ADS)
Xuan, Wenhao; Cook, Therese; Ngo, Henry; Zawol, Zoe; Ruane, Garreth; Mawet, Dimitri
2018-01-01
The Keck/NIRC2 camera, equipped with the vortex coronagraph, is an instrument targeted at the high contrast imaging of extrasolar planets. To uncover a faint planet signal from the overwhelming starlight, we utilize the Vortex Image Processing (VIP) library, which carries out principal component analysis to model and remove the stellar point spread function. To bridge the gap between data acquisition and data reduction, we implement a workflow that 1) downloads, sorts, and processes data with VIP, 2) stores the analysis products into a database, and 3) displays the reduced images, contrast curves, and auxiliary information on a web interface. Both angular differential imaging and reference star differential imaging are implemented in the analysis module. A real-time version of the workflow runs during observations, allowing observers to make educated decisions about time distribution on different targets, hence optimizing science yield. The post-night version performs a standardized reduction after the observation, building up a valuable database that not only helps uncover new discoveries, but also enables a statistical study of the instrument itself. We present the workflow, and an examination of the contrast performance of the NIRC2 vortex with respect to factors including target star properties and observing conditions.
Falcão-Reis, Filipa; Correia, Manuel E
2010-01-01
With the advent of more sophisticated and comprehensive healthcare information systems, system builders are becoming more interested in patient interaction and what he can do to help to improve his own health care. Information systems play nowadays a crucial and fundamental role in hospital work-flows, thus providing great opportunities to introduce and improve upon "patient empowerment" processes for the personalization and management of Electronic Health Records (EHRs). In this paper, we present a patient's privacy generic control mechanisms scenarios based on the Extended OpenID (eOID), a user centric digital identity provider previously developed by our group, which leverages a secured OpenID 2.0 infrastructure with the recently released Portuguese Citizen Card (CC) for secure authentication in a distributed health information environment. eOID also takes advantage of Oauth assertion based mechanisms to implement patient controlled secure qualified role based access to his EHR, by third parties.
Enterprise-wide PACS: beyond radiology, an architecture to manage all medical images.
Bandon, David; Lovis, Christian; Geissbühler, Antoine; Vallée, Jean-Paul
2005-08-01
Picture archiving and communication systems (PACS) have the vocation to manage all medical images acquired within the hospital. To address the various situations encountered in the imaging specialties, the traditional architecture used for the radiology department has to evolve. We present our preliminarily results toward an enterprise-wide PACS intended to support all kind of image production in medicine, from biomolecular images to whole-body pictures. Our solution is based on an existing radiologic PACS system from which images are distributed through an electronic patient record to all care facilities. This platform is enriched with a flexible integration framework supporting digital image communication in medicine (DICOM) and DICOM-XML formats. In addition, a generic workflow engine highly customizable is used to drive work processes. Echocardiology; hematology; ear, nose, and throat; and dermatology, including wounds, follow-up is the first implemented extensions outside of radiology. We also propose a global strategy for further developments based on three possible architectures for an enterprise-wide PACS.
NASA Astrophysics Data System (ADS)
Masek, J.; Rao, A.; Gao, F.; Davis, P.; Jackson, G.; Huang, C.; Weinstein, B.
2008-12-01
The Land Cover Change Community-based Processing and Analysis System (LC-ComPS) combines grid technology, existing science modules, and dynamic workflows to enable users to complete advanced land data processing on data available from local and distributed archives. Changes in land cover represent a direct link between human activities and the global environment, and in turn affect Earth's climate. Thus characterizing land cover change has become a major goal for Earth observation science. Many science algorithms exist to generate new products (e.g., surface reflectance, change detection) used to study land cover change. The overall objective of the LC-ComPS is to release a set of tools and services to the land science community that can be implemented as a flexible LC-ComPS to produce surface reflectance and land-cover change information with ground resolution on the order of Landsat-class instruments. This package includes software modules for pre-processing Landsat-type satellite imagery (calibration, atmospheric correction, orthorectification, precision registration, BRDF correction) for performing land-cover change analysis and includes pre-built workflow chains to automatically generate surface reflectance and land-cover change products based on user input. In order to meet the project objectives, the team created the infrastructure (i.e., client-server system with graphical and machine interfaces) to expand the use of these existing science algorithm capabilities in a community with distributed, large data archives and processing centers. Because of the distributed nature of the user community, grid technology was chosen to unite the dispersed community resources. At that time, grid computing was not used consistently and operationally within the Earth science research community. Therefore, there was a learning curve to configure and implement the underlying public key infrastructure (PKI) interfaces, required for the user authentication, secure file transfer and remote job execution on the grid network of machines. In addition, science support was needed to vet that the grid technology did not have any adverse affects of the science module outputs. Other open source, unproven technologies, such as a workflow package to manage jobs submitted by the user, were infused into the overall system with successful results. This presentation will discuss the basic capabilities of LC-ComPS, explain how the technology was infused, and provide lessons learned for using and integrating the various technologies while developing and operating the system, and finally outline plans moving forward (maintenance and operations decisions) based on the experience to date.
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling
NASA Astrophysics Data System (ADS)
Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.
2015-12-01
The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
A Drupal-Based Collaborative Framework for Science Workflows
NASA Astrophysics Data System (ADS)
Pinheiro da Silva, P.; Gandara, A.
2010-12-01
Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
Muirhead, David; Aoun, Patricia; Powell, Michael; Juncker, Flemming; Mollerup, Jens
2010-08-01
The need for higher efficiency, maximum quality, and faster turnaround time is a continuous focus for anatomic pathology laboratories and drives changes in work scheduling, instrumentation, and management control systems. To determine the costs of generating routine, special, and immunohistochemical microscopic slides in a large, academic anatomic pathology laboratory using a top-down approach. The Pathology Economic Model Tool was used to analyze workflow processes at The Nebraska Medical Center's anatomic pathology laboratory. Data from the analysis were used to generate complete cost estimates, which included not only materials, consumables, and instrumentation but also specific labor and overhead components for each of the laboratory's subareas. The cost data generated by the Pathology Economic Model Tool were compared with the cost estimates generated using relative value units. Despite the use of automated systems for different processes, the workflow in the laboratory was found to be relatively labor intensive. The effect of labor and overhead on per-slide costs was significantly underestimated by traditional relative-value unit calculations when compared with the Pathology Economic Model Tool. Specific workflow defects with significant contributions to the cost per slide were identified. The cost of providing routine, special, and immunohistochemical slides may be significantly underestimated by traditional methods that rely on relative value units. Furthermore, a comprehensive analysis may identify specific workflow processes requiring improvement.
Hans, Parminder K; Gray, Carolyn Steele; Gill, Ashlinder; Tiessen, James
2018-03-01
Aim This qualitative study investigates how the Electronic Patient-Reported Outcome (ePRO) mobile application and portal system, designed to capture patient-reported measures to support self-management, affected primary care provider workflows. The Canadian health system is facing an ageing population that is living with chronic disease. Disruptive innovations like mobile health technologies can help to support health system transformation needed to better meet the multifaceted needs of the complex care patient. However, there are challenges with implementing these technologies in primary care settings, in particular the effect on primary care provider workflows. Over a six-week period interdisciplinary primary care providers (n=6) and their complex care patients (n=12), used the ePRO mobile application and portal to collaboratively goal-set, manage care plans, and support self-management using patient-reported measures. Secondary thematic analysis of focus groups, training sessions, and issue tracker reports captured user experiences at a Toronto area Family Health Team from October 2014 to January 2015. Findings Key issues raised by providers included: liability concerns associated with remote monitoring, increased documentation activities due to a lack of interoperability between the app and the electronic patient record, increased provider anxiety with regard to the potential for the app to disrupt and infringe upon appointment time, and increased demands for patient engagement. Primary care providers reported the app helped to focus care plans and to begin a collaborative conversation on goal-setting. However, throughout our investigation we found a high level of provider resistance evidenced by consistent attempts to shift the app towards fitting with existing workflows rather than adapting much of their behaviour. As health systems seek innovative and disruptive models to better serve this complex patient population, provider change resistance will need to be addressed. New models and technologies cannot be disruptive in an environment that is resisting change.
Miksch, Antje; Trieschmann, Johanna; Ose, Dominik; Rölz, Andreas; Heiderhoff, Marc; Szecsenyi, Joachim
2011-01-01
Effective implementation of disease management programmes (DMPs) in primary care practices often requires changes in practice workflows and responsibilities and acceptance by the parties involved. Within the ELSID study (evaluation study of the DMP diabetes mellitus type 2) the physicians' attitudes toward DMPs were obtained and an optimised implementation of DMPs was developed by conducting a quality management cycle with primary care practice teams. The aim was to investigate which practice workflows will have to be changed and what kind of barriers to implement these changes are perceived. In 78 primary care practices of the two German federal states of Rheinland-Pfalz and Sachsen-Anhalt a quality management cycle was conducted using a structured analysis of the current state of DMP workflows and the need for improvement identified. Subsequently, an optimised workflow was developed and targets were agreed upon. After 6 months, the study team called to inquire about the current state of implementation and, if appropriate, actual barriers to change. After 6 months, 71 practices had been interviewed by phone. 64 of them (90.1%) had agreed on at least one target (e.g., to purchase new instrumentation, to regularly discuss feedback reports, to set up a patient registry). On average three targets had been formulated, and 2 out of 3 had been implemented in the meantime. In most cases lack of time was given as the reason for non-implementation. The majority of surveyed practices perceived some need for improvement. But sufficient resources (time, staff and money) are required to ensure efficient implementation of DMPs in primary care practices and their integration with routine processes. A redefinition of responsibilities for DMPs will strengthen the role of medical assistants and promote high-quality implementation of these programmes. Copyright © 2010. Published by Elsevier GmbH.
Implementation of a single sign-on system between practice, research and learning systems.
Purkayastha, Saptarshi; Gichoya, Judy W; Addepally, Siva Abhishek
2017-03-29
Multiple specialized electronic medical systems are utilized in the health enterprise. Each of these systems has their own user management, authentication and authorization process, which makes it a complex web for navigation and use without a coherent process workflow. Users often have to remember multiple passwords, login/logout between systems that disrupt their clinical workflow. Challenges exist in managing permissions for various cadres of health care providers. This case report describes our experience of implementing a single sign-on system, used between an electronic medical records system and a learning management system at a large academic institution with an informatics department responsible for student education and a medical school affiliated with a hospital system caring for patients and conducting research. At our institution, we use OpenMRS for research registry tracking of interventional radiology patients as well as to provide access to medical records to students studying health informatics. To provide authentication across different users of the system with different permissions, we developed a Central Authentication Service (CAS) module for OpenMRS, released under the Mozilla Public License and deployed it for single sign-on across the academic enterprise. The module has been in implementation since August 2015 to present, and we assessed usability of the registry and education system before and after implementation of the CAS module. 54 students and 3 researchers were interviewed. The module authenticates users with appropriate privileges in the medical records system, providing secure access with minimal disruption to their workflow. No passwords requests were sent and users reported ease of use, with streamlined workflow. The project demonstrates that enterprise-wide single sign-on systems should be used in healthcare to reduce complexity like "password hell", improve usability and user navigation. We plan to extend this to work with other systems used in the health care enterprise.
Evaluation of Standardization of Transfer of Accountability between Inpatient Pharmacists.
Tsoi, Vivian; Dewhurst, Norman; Tom, Elaine
2018-01-01
A compelling body of evidence supports the notion that transfer of accountability (TOA) improves communication, continuity of care, and patient safety. TOA involves the transmission and receipt of information between clinicians at each transition of care. Without a notification system alerting pharmacists to patient transfers, pharmacists' ability to seek out and complete TOA may be hindered. A standardized policy and process for TOA, with automated workflow, was implemented at the study hospital in 2015, to ensure consistency and timeliness of documentation by pharmacists. To evaluate pharmacists' adherence to and satisfaction with the TOA policy and process. A retrospective audit was conducted, using a random sample of individuals who were inpatients between June 2014 and February 2016. Transition points for TOA were identified, and the computerized pharmacy system was reviewed to determine whether TOA had been documented at each transition point. After the audit, an online survey was distributed to assess pharmacists' response to and satisfaction with the TOA policy and workflow. Before the TOA workflow was implemented, TOA documentation by pharmacists ranged from 11% (10/93) to 43% (48/111) of transitions. Eight months after implementation of the workflow, the rate of TOA documentation was 87% (68/78), exceeding the institution's target of 70%. Of the 32 pharmacists surveyed, most were satisfied with the TOA policy and agreed that the standardized workflow was simple to use, increased the number of TOAs provided and received, and improved the quality of completed TOAs. Respondents also indicated that the TOA workflow had improved patient care (mean score 4.09/5, standard deviation 0.64). The standardized TOA policy and process were well received by pharmacists, and resulted in consistent TOA documentation and a TOA documentation rate that exceeded the institutional target.
Cloud-Based Tools to Support High-Resolution Modeling (Invited)
NASA Astrophysics Data System (ADS)
Jones, N.; Nelson, J.; Swain, N.; Christensen, S.
2013-12-01
The majority of watershed models developed to support decision-making by water management agencies are simple, lumped-parameter models. Maturity in research codes and advances in the computational power from multi-core processors on desktop machines, commercial cloud-computing resources, and supercomputers with thousands of cores have created new opportunities for employing more accurate, high-resolution distributed models for routine use in decision support. The barriers for using such models on a more routine basis include massive amounts of spatial data that must be processed for each new scenario and lack of efficient visualization tools. In this presentation we will review a current NSF-funded project called CI-WATER that is intended to overcome many of these roadblocks associated with high-resolution modeling. We are developing a suite of tools that will make it possible to deploy customized web-based apps for running custom scenarios for high-resolution models with minimal effort. These tools are based on a software stack that includes 52 North, MapServer, PostGIS, HT Condor, CKAN, and Python. This open source stack provides a simple scripting environment for quickly configuring new custom applications for running high-resolution models as geoprocessing workflows. The HT Condor component facilitates simple access to local distributed computers or commercial cloud resources when necessary for stochastic simulations. The CKAN framework provides a powerful suite of tools for hosting such workflows in a web-based environment that includes visualization tools and storage of model simulations in a database to archival, querying, and sharing of model results. Prototype applications including land use change, snow melt, and burned area analysis will be presented. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Whole genome sequencing: an efficient approach to ensuring food safety
NASA Astrophysics Data System (ADS)
Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.
2017-09-01
Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.
Data Management and Archiving - a Long Process
NASA Astrophysics Data System (ADS)
Gebauer, Petra; Bertelmann, Roland; Hasler, Tim; Kirchner, Ingo; Klump, Jens; Mettig, Nora; Peters-Kottig, Wolfgang; Rusch, Beate; Ulbricht, Damian
2014-05-01
Implementing policies for research data management to the end of data archiving at university institutions takes a long time. Even though, especially in geosciences, most of the scientists are familiar to analyze different sorts of data, to present statistical results and to write publications sometimes based on big data records, only some of them manage their data in a standardized manner. Much more often they have learned how to measure and to generate large volumes of data than to document these measurements and to preserve them for the future. Changing staff and limited funding make this work more difficult, but it is essential in a progressively developing digital and networked world. Results from the project EWIG (Translates to: Developing workflow components for long-term archiving of research data in geosciences), funded by Deutsche Forschungsgemeinschaft, will help on these theme. Together with the project partners Deutsches GeoForschungsZentrum Potsdam and Konrad-Zuse-Zentrum für Informationstechnik Berlin a workflow to transfer continuously recorded data from a meteorological city monitoring network into a long-term archive was developed. This workflow includes quality assurance of the data as well as description of metadata and using tools to prepare data packages for long term archiving. It will be an exemplary model for other institutions working with similar data. The development of this workflow is closely intertwined with the educational curriculum at the Institut für Meteorologie. Designing modules to run quality checks for meteorological time series of data measured every minute and preparing metadata are tasks in actual bachelor theses. Students will also test the usability of the generated working environment. Based on these experiences a practical guideline for integrating research data management in curricula will be one of the results of this project, for postgraduates as well as for younger students. Especially at the beginning of the scientific career it is necessary to become familiar with all issues concerning data management. The outcomes of EWIG are intended to be generic enough to be easily adopted by other institutions. University lectures in meteorology were started to teach future scientific generations right from the start how to deal with all sorts of different data in a transparent way. The progress of the project EWIG can be followed on the web via ewig.gfz-potsdam.de
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo
2011-07-20
Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Towards Dynamic Authentication in the Grid — Secure and Mobile Business Workflows Using GSet
NASA Astrophysics Data System (ADS)
Mangler, Jürgen; Schikuta, Erich; Witzany, Christoph; Jorns, Oliver; Ul Haq, Irfan; Wanek, Helmut
Until now, the research community mainly focused on the technical aspects of Grid computing and neglected commercial issues. However, recently the community tends to accept that the success of the Grid is crucially based on commercial exploitation. In our vision Foster's and Kesselman's statement "The Grid is all about sharing." has to be extended by "... and making money out of it!". To allow for the realization of this vision the trust-worthyness of the underlying technology needs to be ensured. This can be achieved by the use of gSET (Gridified Secure Electronic Transaction) as a basic technology for trust management and secure accounting in the presented Grid based workflow. We present a framework, conceptually and technically, from the area of the Mobile-Grid, which justifies the Grid infrastructure as a viable platform to enable commercially successful business workflows.
Clinic Workflow Simulations using Secondary EHR Data
Hribar, Michelle R.; Biermann, David; Read-Brown, Sarah; Reznick, Leah; Lombardi, Lorinna; Parikh, Mansi; Chamberlain, Winston; Yackel, Thomas R.; Chiang, Michael F.
2016-01-01
Clinicians today face increased patient loads, decreased reimbursements and potential negative productivity impacts of using electronic health records (EHR), but have little guidance on how to improve clinic efficiency. Discrete event simulation models are powerful tools for evaluating clinical workflow and improving efficiency, particularly when they are built from secondary EHR timing data. The purpose of this study is to demonstrate that these simulation models can be used for resource allocation decision making as well as for evaluating novel scheduling strategies in outpatient ophthalmology clinics. Key findings from this study are that: 1) secondary use of EHR timestamp data in simulation models represents clinic workflow, 2) simulations provide insight into the best allocation of resources in a clinic, 3) simulations provide critical information for schedule creation and decision making by clinic managers, and 4) simulation models built from EHR data are potentially generalizable. PMID:28269861
BioPartsDB: a synthetic biology workflow web-application for education and research.
Stracquadanio, Giovanni; Yang, Kun; Boeke, Jef D; Bader, Joel S
2016-11-15
Synthetic biology has become a widely used technology, and expanding applications in research, education and industry require progress tracking for team-based DNA synthesis projects. Although some vendors are beginning to supply multi-kilobase sequence-verified constructs, synthesis workflows starting with short oligos remain important for cost savings and pedagogical benefit. We developed BioPartsDB as an open source, extendable workflow management system for synthetic biology projects with entry points for oligos and larger DNA constructs and ending with sequence-verified clones. BioPartsDB is released under the MIT license and available for download at https://github.com/baderzone/biopartsdb Additional documentation and video tutorials are available at https://github.com/baderzone/biopartsdb/wiki An Amazon Web Services image is available from the AWS Market Place (ami-a01d07c8). joel.bader@jhu.edu. © The Author 2016. Published by Oxford University Press.
Integrating pathology and radiology disciplines: an emerging opportunity?
2012-01-01
Pathology and radiology form the core of cancer diagnosis, yet the workflows of both specialties remain ad hoc and occur in separate "silos," with no direct linkage between their case accessioning and/or reporting systems, even when both departments belong to the same host institution. Because both radiologists' and pathologists' data are essential to making correct diagnoses and appropriate patient management and treatment decisions, this isolation of radiology and pathology workflows can be detrimental to the quality and outcomes of patient care. These detrimental effects underscore the need for pathology and radiology workflow integration and for systems that facilitate the synthesis of all data produced by both specialties. With the enormous technological advances currently occurring in both fields, the opportunity has emerged to develop an integrated diagnostic reporting system that supports both specialties and, therefore, improves the overall quality of patient care. PMID:22950414
Using AI and Semantic Web Technologies to attack Process Complexity in Open Systems
NASA Astrophysics Data System (ADS)
Thompson, Simon; Giles, Nick; Li, Yang; Gharib, Hamid; Nguyen, Thuc Duong
Recently many vendors and groups have advocated using BPEL and WS-BPEL as a workflow language to encapsulate business logic. While encapsulating workflow and process logic in one place is a sensible architectural decision the implementation of complex workflows suffers from the same problems that made managing and maintaining hierarchical procedural programs difficult. BPEL lacks constructs for logical modularity such as the requirements construct from the STL [12] or the ability to adapt constructs like pure abstract classes for the same purpose. We describe a system that uses semantic web and agent concepts to implement an abstraction layer for BPEL based on the notion of Goals and service typing. AI planning was used to enable process engineers to create and validate systems that used services and goals as first class concepts and compiled processes at run time for execution.
Gough, Albert; Shun, Tongying; Taylor, D. Lansing; Schurdak, Mark
2016-01-01
Heterogeneity is well recognized as a common property of cellular systems that impacts biomedical research and the development of therapeutics and diagnostics. Several studies have shown that analysis of heterogeneity: gives insight into mechanisms of action of perturbagens; can be used to predict optimal combination therapies; and to quantify heterogeneity in tumors where heterogeneity is believed to be associated with adaptation and resistance. Cytometry methods including high content screening (HCS), high throughput microscopy, flow cytometry, mass spec imaging and digital pathology capture cell level data for populations of cells. However it is often assumed that the population response is normally distributed and therefore that the average adequately describes the results. A deeper understanding of the results of the measurements and more effective comparison of perturbagen effects requires analysis that takes into account the distribution of the measurements, i.e. the heterogeneity. However, the reproducibility of heterogeneous data collected on different days, and in different plates/slides has not previously been evaluated. Here we show that conventional assay quality metrics alone are not adequate for quality control of the heterogeneity in the data. To address this need, we demonstrate the use of the Kolmogorov-Smirnov statistic as a metric for monitoring the reproducibility of heterogeneity in an SAR screen, describe a workflow for quality control in heterogeneity analysis. One major challenge in high throughput biology is the evaluation and interpretation of heterogeneity in thousands of samples, such as compounds in a cell-based screen. In this study we also demonstrate that three heterogeneity indices previously reported, capture the shapes of the distributions and provide a means to filter and browse big data sets of cellular distributions in order to compare and identify distributions of interest. These metrics and methods are presented as a workflow for analysis of heterogeneity in large scale biology projects. PMID:26476369
Workflow Optimization in Vertebrobasilar Occlusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamper, Lars, E-mail: lars.kamper@helios-kliniken.de; Meyn, Hannes; Rybacki, Konrad
2012-06-15
Objective: In vertebrobasilar occlusion, rapid recanalization is the only substantial means to improve the prognosis. We introduced a standard operating procedure (SOP) for interventional therapy to analyze the effects on interdisciplinary time management. Methods: Intrahospital time periods between hospital admission and neuroradiological intervention were retrospectively analyzed, together with the patients' outcome, before (n = 18) and after (n = 20) implementation of the SOP. Results: After implementation of the SOP, we observed statistically significant improvement of postinterventional patient neurological status (p = 0.017). In addition, we found a decrease of 5:33 h for the mean time period from hospital admissionmore » until neuroradiological intervention. The recanalization rate increased from 72.2% to 80% after implementation of the SOP. Conclusion: Our results underscore the relevance of SOP implementation and analysis of time management for clinical workflow optimization. Both may trigger awareness for the need of efficient interdisciplinary time management. This could be an explanation for the decreased time periods and improved postinterventional patient status after SOP implementation.« less
ERIC Educational Resources Information Center
Pardos, Zachary A.; Whyte, Anthony; Kao, Kevin
2016-01-01
In this paper, we address issues of transparency, modularity, and privacy with the introduction of an open source, web-based data repository and analysis tool tailored to the Massive Open Online Course community. The tool integrates data request/authorization and distribution workflow features as well as provides a simple analytics module upload…
Problem Management Module: An Innovative System to Improve Problem List Workflow
Hodge, Chad M.; Kuttler, Kathryn G.; Bowes, Watson A.; Narus, Scott P.
2014-01-01
Electronic problem lists are essential to modern health record systems, with a primary goal to serve as the repository of a patient’s current health issues. Additionally, coded problems can be used to drive downstream activities such as decision support, evidence-based medicine, billing, and cohort generation for research. Meaningful Use also requires use of a coded problem list. Over the course of three years, Intermountain Healthcare developed a problem management module (PMM) that provided innovative functionality to improve clinical workflow and boost problem list adoption, e.g. smart search, user customizable views, problem evolution, and problem timelines. In 23 months of clinical use, clinicians entered over 70,000 health issues, the percentage of free-text items dropped to 1.2%, completeness of problem list items increased by 14%, and more collaborative habits were initiated. PMID:25954372
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey, Dustin Yewell
This document is a white paper marketing proposal for Echo™ is a data analysis platform designed for efficient, robust, and scalable creation and execution of complex workflows. Echo’s analysis management system refers to the ability to track, understand, and reproduce workflows used for arriving at results and decisions. Echo improves on traditional scripted data analysis in MATLAB, Python, R, and other languages to allow analysts to make better use of their time. Additionally, the Echo platform provides a powerful data management and curation solution allowing analysts to quickly find, access, and consume datasets. After two years of development and amore » first release in early 2016, Echo is now available for use with many data types in a wide range of application domains. Echo provides tools that allow users to focus on data analysis and decisions with confidence that results are reported accurately.« less
Mobile task management tool that improves workflow of an acute general surgical service.
Foo, Elizabeth; McDonald, Rod; Savage, Earle; Floyd, Richard; Butler, Anthony; Rumball-Smith, Alistair; Connor, Saxon
2015-10-01
Understanding and being able to measure constraints within a health system is crucial if outcomes are to be improved. Current systems lack the ability to capture decision making with regard to tasks performed within a patient journey. The aim of this study was to assess the impact of a mobile task management tool on clinical workflow within an acute general surgical service by analysing data capture and usability of the application tool. The Cortex iOS application was developed to digitize patient flow and provide real-time visibility over clinical decision making and task performance. Study outcomes measured were workflow data capture for patient and staff events. Usability was assessed using an electronic survey. There were 449 unique patient journeys tracked with a total of 3072 patient events recorded. The results repository was accessed 7792 times. The participants reported that the application sped up decision making, reduced redundancy of work and improved team communication. The mode of the estimated time the application saved participants was 5-9 min/h of work. Of the 14 respondents, nine discarded their analogue methods of tracking tasks by the end of the study period. The introduction of a mobile task management system improved the working efficiency of junior clinical staff. The application allowed capture of data not previously available to hospital systems. In the future, such data will contribute to the accurate mapping of patient journeys through the health system. © 2015 Royal Australasian College of Surgeons.
MouseNet database: digital management of a large-scale mutagenesis project.
Pargent, W; Heffner, S; Schäble, K F; Soewarto, D; Fuchs, H; Hrabé de Angelis, M
2000-07-01
The Munich ENU Mouse Mutagenesis Screen is a large-scale mutant production, phenotyping, and mapping project. It encompasses two animal breeding facilities and a number of screening groups located in the general area of Munich. A central database is required to manage and process the immense amount of data generated by the mutagenesis project. This database, which we named MouseNet(c), runs on a Sybase platform and will finally store and process all data from the entire project. In addition, the system comprises a portfolio of functions needed to support the workflow management of the core facility and the screening groups. MouseNet(c) will make all of the data available to the participating screening groups, and later to the international scientific community. MouseNet(c) will consist of three major software components:* Animal Management System (AMS)* Sample Tracking System (STS)* Result Documentation System (RDS)MouseNet(c) provides the following major advantages:* being accessible from different client platforms via the Internet* being a full-featured multi-user system (including access restriction and data locking mechanisms)* relying on a professional RDBMS (relational database management system) which runs on a UNIX server platform* supplying workflow functions and a variety of plausibility checks.
Night of the living color: horror scenarios in color management land
NASA Astrophysics Data System (ADS)
Lammens, Johan M.
1998-12-01
An ICC-based color management is becoming increasingly feasible and its picking up support from all the major high end design and pre-press applications as well as hardware manufacturers. In addition, the new sRGB standard is emerging as a way to effectively do 'color management for the masses', and is being supported by many leading manufacturers as well. While there certainly remain serious technical issues to address for both ICC and sRGB color management, it seems that the main problem users are facing today is how to integrate all components of their workflow into a seamless system, and how to configured each component to work well with all the others. This paper takes a brief look at the history of color management for a workflow perspective, and attempts to analyze how to compose and configured a quadruple color conversions can become a terrific nightmare. Some of the many ways to get the wrong results are briefly illustrated, as well as a few ways to get the right results. Finally, some technical recommendations are offered for how to improve the situation from a user point of view.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology
Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsai, Yingssu; Stanford University, 333 Campus Drive, Mudd Building, Stanford, CA 94305-5080; McPhillips, Scott E.
New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data,more » performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.« less
NASA Astrophysics Data System (ADS)
Foglini, Federica; Grande, Valentina; De Leo, Francesco; Mantovani, Simone; Ferraresi, Sergio
2017-04-01
EVER-EST offers a framework based on advanced services delivered both at the e-infrastructure and domain-specific level, with the objective of supporting each phase of the Earth Science Research and Information Lifecycle. It provides innovative e-research services to Earth Science user communities for communication, cross-validation and the sharing of knowledge and science outputs. The project follows a user-centric approach: real use cases taken from pre-selected Virtual Research Communities (VRC) covering different Earth Science research scenarios drive the implementation of the Virtual Research Environment (VRE) services and capabilities. The Sea Monitoring community is involved in the evaluation of the EVER-EST infrastructure. The community of potential users is wide and heterogeneous including both multi-disciplinary scientists and national/international agencies and authorities (e.g. MPAs directors, technicians from regional agencies like ARPA in Italy, the technicians working for the Ministry of the Environment) dealing with the adoption of a better way of measuring the quality of the environment. The scientific community has the main role of assessing the best criteria and indicators for defining the Good Environmental Status (GES) in their own sub regions, and implementing methods, protocols and tools for monitoring the GES descriptors. According to the Marine Strategy Framework Directive (MSFD), the environmental status of marine waters is defined by 11 descriptors, and forms a proposed set of 29 associated criteria and 56 different indicators. The objective of the Sea Monitoring VRC is to provide useful and applicable contributions to the evaluation of the descriptors: D1.Biodiversity, D2.Non-indigenous species and D6.Seafloor Integrity (http://ec.europa.eu/environment/marine/good-environmental-status/index_en.htm). The main challenges for the community members are: 1. discovery of existing data and products distributed among different infrastructures; 2. sharing methodologies about the GES evaluation and monitoring; 3. working on the same workflows and data; 4. adopting shared powerful tools for data processing (e.g. software and servers). The Sea Monitoring portal provides the VRC users with tools and services aimed at enhancing their ability to interoperate and share knowledge, experience and methods for GES assessment and monitoring, such as: •digital information services for data management, exploitation and preservation (accessibility of heterogeneous data sources including associated documentation); •e-collaboration services to communicate and share knowledge, ideas, protocols and workflows; •e-learning services to facilitate the use of common workflows for assessing GES indicators; •e-research services for workflow management, validation and verification, as well as visualization and interactive services. The current study is co-financed by the European Union's Horizon 2020 research and innovation programme under the EVER-EST project (Grant Agreement No. 674907).
Science Gateways, Scientific Workflows and Open Community Software
NASA Astrophysics Data System (ADS)
Pierce, M. E.; Marru, S.
2014-12-01
Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing Apache Airavata as a hosted service to provide these features.
CRAB3: Establishing a new generation of services for distributed analysis at CMS
NASA Astrophysics Data System (ADS)
Cinquilli, M.; Spiga, D.; Grandi, C.; Hernàndez, J. M.; Konstantinov, P.; Mascheroni, M.; Riahi, H.; Vaandering, E.
2012-12-01
In CMS Computing the highest priorities for analysis tools are the improvement of the end users’ ability to produce and publish reliable samples and analysis results as well as a transition to a sustainable development and operations model. To achieve these goals CMS decided to incorporate analysis processing into the same framework as data and simulation processing. This strategy foresees that all workload tools (TierO, Tier1, production, analysis) share a common core with long term maintainability as well as the standardization of the operator interfaces. The re-engineered analysis workload manager, called CRAB3, makes use of newer technologies, such as RESTFul based web services and NoSQL Databases, aiming to increase the scalability and reliability of the system. As opposed to CRAB2, in CRAB3 all work is centrally injected and managed in a global queue. A pool of agents, which can be geographically distributed, consumes work from the central services serving the user tasks. The new architecture of CRAB substantially changes the deployment model and operations activities. In this paper we present the implementation of CRAB3, emphasizing how the new architecture improves the workflow automation and simplifies maintainability. In particular, we will highlight the impact of the new design on daily operations.
Quality data collection and management technology of aerospace complex product assembly process
NASA Astrophysics Data System (ADS)
Weng, Gang; Liu, Jianhua; He, Yongxi; Zhuang, Cunbo
2017-04-01
Aiming at solving problems of difficult management and poor traceability for discrete assembly process quality data, a data collection and management method is proposed which take the assembly process and BOM as the core. Data collection method base on workflow technology, data model base on BOM and quality traceability of assembly process is included in the method. Finally, assembly process quality data management system is developed and effective control and management of quality information for complex product assembly process is realized.
Spjuth, Ola; Karlsson, Andreas; Clements, Mark; Humphreys, Keith; Ivansson, Emma; Dowling, Jim; Eklund, Martin; Jauhiainen, Alexandra; Czene, Kamila; Grönberg, Henrik; Sparén, Pär; Wiklund, Fredrik; Cheddad, Abbas; Pálsdóttir, Þorgerður; Rantalainen, Mattias; Abrahamsson, Linda; Laure, Erwin; Litton, Jan-Eric; Palmgren, Juni
2017-09-01
We provide an e-Science perspective on the workflow from risk factor discovery and classification of disease to evaluation of personalized intervention programs. As case studies, we use personalized prostate and breast cancer screenings. We describe an e-Science initiative in Sweden, e-Science for Cancer Prevention and Control (eCPC), which supports biomarker discovery and offers decision support for personalized intervention strategies. The generic eCPC contribution is a workflow with 4 nodes applied iteratively, and the concept of e-Science signifies systematic use of tools from the mathematical, statistical, data, and computer sciences. The eCPC workflow is illustrated through 2 case studies. For prostate cancer, an in-house personalized screening tool, the Stockholm-3 model (S3M), is presented as an alternative to prostate-specific antigen testing alone. S3M is evaluated in a trial setting and plans for rollout in the population are discussed. For breast cancer, new biomarkers based on breast density and molecular profiles are developed and the US multicenter Women Informed to Screen Depending on Measures (WISDOM) trial is referred to for evaluation. While current eCPC data management uses a traditional data warehouse model, we discuss eCPC-developed features of a coherent data integration platform. E-Science tools are a key part of an evidence-based process for personalized medicine. This paper provides a structured workflow from data and models to evaluation of new personalized intervention strategies. The importance of multidisciplinary collaboration is emphasized. Importantly, the generic concepts of the suggested eCPC workflow are transferrable to other disease domains, although each disease will require tailored solutions. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Schnabel, M; Mann, D; Efe, T; Schrappe, M; V Garrel, T; Gotzen, L; Schaeg, M
2004-10-01
The introduction of the German Diagnostic Related Groups (D-DRG) system requires redesigning administrative patient management strategies. Wrong coding leads to inaccurate grouping and endangers the reimbursement of treatment costs. This situation emphasizes the roles of documentation and coding as factors of economical success. The aims of this study were to assess the quantity and quality of initial documentation and coding (ICD-10 and OPS-301) and find operative strategies to improve efficiency and strategic means to ensure optimal documentation and coding quality. In a prospective study, documentation and coding quality were evaluated in a standardized way by weekly assessment. Clinical data from 1385 inpatients were processed for initial correctness and quality of documentation and coding. Principal diagnoses were found to be accurate in 82.7% of cases, inexact in 7.1%, and wrong in 10.1%. Effects on financial returns occurred in 16%. Based on these findings, an optimized, interdisciplinary, and multiprofessional workflow on medical documentation, coding, and data control was developed. Workflow incorporating regular assessment of documentation and coding quality is required by the DRG system to ensure efficient accounting of hospital services. Interdisciplinary and multiprofessional cooperation is recognized to be an important factor in establishing an efficient workflow in medical documentation and coding.
Landman, Adam; Teich, Jonathan M; Pruitt, Peter; Moore, Samantha E; Theriault, Jennifer; Dorisca, Elizabeth; Harris, Sheila; Crim, Heidi; Lurie, Nicole; Goralnick, Eric
2015-07-01
Emergency department (ED) information systems are designed to support efficient and safe emergency care. These same systems often play a critical role in disasters to facilitate real-time situation awareness, information management, and communication. In this article, we describe one ED's experiences with ED information systems during the April 2013 Boston Marathon bombings. During postevent debriefings, staff shared that our ED information systems and workflow did not optimally support this incident; we found challenges with our unidentified patient naming convention, real-time situational awareness of patient location, and documentation of assessments, orders, and procedures. As a result, before our next mass gathering event, we changed our unidentified patient naming convention to more clearly distinguish multiple, simultaneous, unidentified patients. We also made changes to the disaster registration workflow and enhanced roles and responsibilities for updating electronic systems. Health systems should conduct disaster drills using their ED information systems to identify inefficiencies before an actual incident. ED information systems may require enhancements to better support disasters. Newer technologies, such as radiofrequency identification, could further improve disaster information management and communication but require careful evaluation and implementation into daily ED workflow. Copyright © 2014 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.
NG6: Integrated next generation sequencing storage and processing environment.
Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe
2012-09-09
Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
Diabetes management using modern information and communication technologies and new care models.
Spanakis, Emmanouil G; Chiarugi, Franco; Kouroubali, Angelina; Spat, Stephan; Beck, Peter; Asanin, Stefan; Rosengren, Peter; Gergely, Tamas; Thestrup, Jesper
2012-10-04
Diabetes, a metabolic disorder, has reached epidemic proportions in developed countries. The disease has two main forms: type 1 and type 2. Disease management entails administration of insulin in combination with careful blood glucose monitoring (type 1) or involves the adjustment of diet and exercise level, the use of oral anti-diabetic drugs, and insulin administration to control blood sugar (type 2). State-of-the-art technologies have the potential to assist healthcare professionals, patients, and informal carers to better manage diabetes insulin therapy, help patients understand their disease, support self-management, and provide a safe environment by monitoring adverse and potentially life-threatening situations with appropriate crisis management. New care models incorporating advanced information and communication technologies have the potential to provide service platforms able to improve health care, personalization, inclusion, and empowerment of the patient, and to support diverse user preferences and needs in different countries. The REACTION project proposes to create a service-oriented architectural platform based on numerous individual services and implementing novel care models that can be deployed in different settings to perform patient monitoring, distributed decision support, health care workflow management, and clinical feedback provision. This paper presents the work performed in the context of the REACTION project focusing on the development of a health care service platform able to support diabetes management in different healthcare regimes, through clinical applications, such as monitoring of vital signs, feedback provision to the point of care, integrative risk assessment, and event and alarm handling. While moving towards the full implementation of the platform, three major areas of research and development have been identified and consequently approached: the first one is related to the glucose sensor technology and wearability, the second is related to the platform architecture, and the third to the implementation of the end-user services. The Glucose Management System, already developed within the REACTION project, is able to monitor a range of parameters from various sources including glucose levels, nutritional intakes, administered drugs, and patient's insulin sensitivity, offering decision support for insulin dosing to professional caregivers on a mobile tablet platform that fulfills the need of the users and supports medical workflow procedures in compliance with the Medical Device Directive requirements. Good control of diabetes, as well as increased emphasis on control of lifestyle factors, may reduce the risk profile of most complications and contribute to health improvement. The REACTION project aims to respond to these challenges by providing integrated, professional, management, and therapy services to diabetic patients in different health care regimes across Europe in an interoperable communication platform.
Dollar, Daniel M; Gallagher, John; Glover, Janis; Marone, Regina Kenny; Crooker, Cynthia
2007-04-01
To support migration from print to electronic resources, the Cushing/Whitney Medical Library at Yale University reorganized its Technical Services Department to focus on managing electronic resources. The library hired consultants to help plan the changes and to present recommendations for integrating electronic resource management into every position. The library task force decided to focus initial efforts on the periodical collection. To free staff time to devote to electronic journals, most of the print subscriptions were switched to online only and new workflows were developed for e-journals. Staff learned new responsibilities such as activating e-journals, maintaining accurate holdings information in the online public access catalog and e-journals database ("electronic shelf reading"), updating the link resolver knowledgebase, and troubleshooting. All of the serials team members now spend significant amounts of time managing e-journals. The serials staff now spends its time managing the materials most important to the library's clientele (e-journals and databases). The team's proactive approach to maintenance work and rapid response to reported problems should improve patrons' experiences using e-journals. The library is taking advantage of new technologies such as an electronic resource management system, and library workflows and procedures will continue to evolve as technology changes.
Dollar, Daniel M.; Gallagher, John; Glover, Janis; Marone, Regina Kenny; Crooker, Cynthia
2007-01-01
Objective: To support migration from print to electronic resources, the Cushing/Whitney Medical Library at Yale University reorganized its Technical Services Department to focus on managing electronic resources. Methods: The library hired consultants to help plan the changes and to present recommendations for integrating electronic resource management into every position. The library task force decided to focus initial efforts on the periodical collection. To free staff time to devote to electronic journals, most of the print subscriptions were switched to online only and new workflows were developed for e-journals. Results: Staff learned new responsibilities such as activating e-journals, maintaining accurate holdings information in the online public access catalog and e-journals database (“electronic shelf reading”), updating the link resolver knowledgebase, and troubleshooting. All of the serials team members now spend significant amounts of time managing e-journals. Conclusions: The serials staff now spends its time managing the materials most important to the library's clientele (e-journals and databases). The team's proactive approach to maintenance work and rapid response to reported problems should improve patrons' experiences using e-journals. The library is taking advantage of new technologies such as an electronic resource management system, and library workflows and procedures will continue to evolve as technology changes. PMID:17443247
Tools, Techniques, and Training: Results of an E-Resources Troubleshooting Survey
ERIC Educational Resources Information Center
Rathmel, Angela; Mobley, Liisa; Pennington, Buddy; Chandler, Adam
2015-01-01
A primary role of any e-resources librarian or staff is troubleshooting electronic resources (e-resources). While much progress has been made in many areas of e-resources management (ERM) to understand the ERM lifecycle and to manage workflows, troubleshooting access remains a challenge. This collaborative study is the result of the well-received…
A Technology Solution Strengthens Comprehensive Environmental Management
2012-05-23
General Navigation Chemical Approval Example NEPA Coordination Example Safety PPE Example Summary Marine Corps Support Facility...coordination, completion and documentation through automated workflows of various business processes Chemical Approval NEPA Coordination Safety ...Completion Diagram Government Employee/M CMC MCMC Chemical Manager MCMC HS&E Specialist IMO Chemical Safety Specialist IMO Chemical Environmental
Checklist Manifesto for Electronic Resources: Getting Ready for the Fiscal Year and Beyond
ERIC Educational Resources Information Center
England, Lenore; Fu, Li; Miller, Stephen
2011-01-01
Organization of electronic resources workflow is critical in the increasingly complicated and complex world of library management. A simple organizational tool that can be readily applied to electronic resources management (ERM) is the use of checklists. Based on the principles discussed in The Checklist Manifesto: How to Get Things Right, the…
2009-01-01
Background In recent years, the genome biology community has expended considerable effort to confront the challenges of managing heterogeneous data in a structured and organized way and developed laboratory information management systems (LIMS) for both raw and processed data. On the other hand, electronic notebooks were developed to record and manage scientific data, and facilitate data-sharing. Software which enables both, management of large datasets and digital recording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques. Results We have developed iLAP (Laboratory data management, Analysis, and Protocol development), a workflow-driven information management system specifically designed to create and manage experimental protocols, and to analyze and share laboratory data. The system combines experimental protocol development, wizard-based data acquisition, and high-throughput data analysis into a single, integrated system. We demonstrate the power and the flexibility of the platform using a microscopy case study based on a combinatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under the open source license AGPL from http://genome.tugraz.at/iLAP/. Conclusion iLAP is a flexible and versatile information management system, which has the potential to close the gap between electronic notebooks and LIMS and can therefore be of great value for a broad scientific community. PMID:19941647
A Semi-Automated Workflow Solution for Data Set Publication
Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; ...
2016-03-08
In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
CASAS: A tool for composing automatically and semantically astrophysical services
NASA Astrophysics Data System (ADS)
Louge, T.; Karray, M. H.; Archimède, B.; Knödlseder, J.
2017-07-01
Multiple astronomical datasets are available through internet and the astrophysical Distributed Computing Infrastructure (DCI) called Virtual Observatory (VO). Some scientific workflow technologies exist for retrieving and combining data from those sources. However selection of relevant services, automation of the workflows composition and the lack of user-friendly platforms remain a concern. This paper presents CASAS, a tool for semantic web services composition in astrophysics. This tool proposes automatic composition of astrophysical web services and brings a semantics-based, automatic composition of workflows. It widens the services choice and eases the use of heterogeneous services. Semantic web services composition relies on ontologies for elaborating the services composition; this work is based on Astrophysical Services ONtology (ASON). ASON had its structure mostly inherited from the VO services capacities. Nevertheless, our approach is not limited to the VO and brings VO plus non-VO services together without the need for premade recipes. CASAS is available for use through a simple web interface.
A Semi-Automated Workflow Solution for Data Set Publication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.
In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
Identifying appropriate protected areas for endangered fern species under climate change.
Wang, Chun-Jing; Wan, Ji-Zhong; Zhang, Zhi-Xiang; Zhang, Gang-Min
2016-01-01
The management of protected areas (PAs) is widely used in the conservation of endangered plant species under climate change. However, studies that have identified appropriate PAs for endangered fern species are rare. To address this gap, we must develop a workflow to plan appropriate PAs for endangered fern species that will be further impacted by climate change. Here, we used endangered fern species in China as a case study, and we applied conservation planning software coupled with endangered fern species distribution data and distribution modeling to plan conservation areas with high priority protection needs under climate change. We identified appropriate PAs for endangered fern species under climate change based on the IUCN protected area categories (from Ia to VI) and planned additional PAs for endangered fern species. The high priority regions for protecting the endangered fern species were distributed throughout southern China. With decreasing temperature seasonality, the priority ranking of all endangered fern species is projected to increase in existing PAs. Accordingly, we need to establish conservation areas with low climate vulnerability in existing PAs and expand the conservation areas for endangered fern species in the high priority conservation regions.
2011-02-01
Process Architecture Technology Analysis: Executive .............................................. 15 UIMA as Executive...44 A.4: Flow Code in UIMA ......................................................................................................... 46... UIMA ................................................................................................................................ 57 E.2
Wireless-PDA-controlled image workflow from PACS: the next trend in the health care enterprise?
NASA Astrophysics Data System (ADS)
Erberich, Stephan G.; Documet, Jorge; Zhou, Michael Z.; Cao, Fei; Liu, Brent J.; Mogel, Greg T.; Huang, H. K.
2003-05-01
Image workflow in today's Picture Archiving and Communication Systems (PACS) is controlled from fixed Display Workstations (DW) using proprietary control interfaces. A remote access to the Hospital Information System (HIS) and Radiology Information System (RIS) for urgent patient information retrieval does not exist or gradually become available. The lack for remote access and workflow control for HIS and RIS is especially true when it comes to medical images of a PACS on Department or Hospital level. As images become more complex and data sizes expand rapidly with new image techniques like functional MRI, Mammography or routine spiral CT to name a few, the access and manageability becomes an important issue. Long image downloads or incomplete work lists cannot be tolerated in a busy health care environment. In addition, the domain of the PACS is no longer limited to the imaging department and PACS is also being used in the ER and emergency care units. Thus a prompt and secure access and manageability not only by the radiologist, but also from the physician becomes crucial to optimally utilize the PACS in the health care enterprise of the new millennium. The purpose of this paper is to introduce a concept and its implementation of a remote access and workflow control of the PACS combining wireless, Internet and Internet2 technologies. A wireless device, the Personal Digital Assistant (PDA), is used to communicate to a PACS web server that acts as a gateway controlling the commands for which the user has access to the PACS server. The commands implemented for this test-bed are query/retrieve of the patient list and study list including modality, examination, series and image selection and pushing any list items to a selected DW on the PACS network.
NASA Astrophysics Data System (ADS)
Makatun, Dzmitry; Lauret, Jérôme; Rudová, Hana; Šumbera, Michal
2015-05-01
When running data intensive applications on distributed computational resources long I/O overheads may be observed as access to remotely stored data is performed. Latencies and bandwidth can become the major limiting factor for the overall computation performance and can reduce the CPU/WallTime ratio to excessive IO wait. Reusing the knowledge of our previous research, we propose a constraint programming based planner that schedules computational jobs and data placements (transfers) in a distributed environment in order to optimize resource utilization and reduce the overall processing completion time. The optimization is achieved by ensuring that none of the resources (network links, data storages and CPUs) are oversaturated at any moment of time and either (a) that the data is pre-placed at the site where the job runs or (b) that the jobs are scheduled where the data is already present. Such an approach eliminates the idle CPU cycles occurring when the job is waiting for the I/O from a remote site and would have wide application in the community. Our planner was evaluated and simulated based on data extracted from log files of batch and data management systems of the STAR experiment. The results of evaluation and estimation of performance improvements are discussed in this paper.
A workflow to investigate exposure and pharmacokinetic ...
Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption, distribution, metabolism, excretion (ADME) properties of chemicals. We developed a conceptual workflow to consider exposure and ADME properties in relationship to an MIE and demonstrated the utility of this workflow using a previously established AOP, acetylcholinesterase (AChE) inhibition. Thirty active chemicals found to inhibit AChE in the ToxCastTM assay were examined with respect to their exposure and absorption potentials, and their ability to cross the blood-brain barrier. Structural similarities of active compounds were compared against structures of inactive compounds to detect possible non-active parents that might have active metabolites. Fifty-two of the 1,029 inactive compounds exhibited a similarity threshold above 75% with their nearest active neighbors. Excluding compounds that may not be absorbed, 22 could be potentially toxic following metabolism. The incorporation of exposure and ADME properties into the conceptual workflow resulted in prioritization of 20 out of 30 active compounds identified in an AChE inhibition assay for further analysis, along with identification of several inactive parent compounds of active metabolites. This qualitative approach can minimize co
The BioExtract Server: a web-based bioinformatic workflow platform
Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.
2011-01-01
The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.
Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor
2016-01-01
In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
Leaf LIMS: A Flexible Laboratory Information Management System with a Synthetic Biology Focus.
Craig, Thomas; Holland, Richard; D'Amore, Rosalinda; Johnson, James R; McCue, Hannah V; West, Anthony; Zulkower, Valentin; Tekotte, Hille; Cai, Yizhi; Swan, Daniel; Davey, Robert P; Hertz-Fowler, Christiane; Hall, Anthony; Caddick, Mark
2017-12-15
This paper presents Leaf LIMS, a flexible laboratory information management system (LIMS) designed to address the complexity of synthetic biology workflows. At the project's inception there was a lack of a LIMS designed specifically to address synthetic biology processes, with most systems focused on either next generation sequencing or biobanks and clinical sample handling. Leaf LIMS implements integrated project, item, and laboratory stock tracking, offering complete sample and construct genealogy, materials and lot tracking, and modular assay data capture. Hence, it enables highly configurable task-based workflows and supports data capture from project inception to completion. As such, in addition to it supporting synthetic biology it is ideal for many laboratory environments with multiple projects and users. The system is deployed as a web application through Docker and is provided under a permissive MIT license. It is freely available for download at https://leaflims.github.io .
Visualisation methods for large provenance collections in data-intensive collaborative platforms
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Fligueira, Rosa; Atkinson, Malcolm; Gemuend, Andre
2016-04-01
This work investigates improving the methods of visually representing provenance information in the context of modern data-driven scientific research. It explores scenarios where data-intensive workflows systems are serving communities of researchers within collaborative environments, supporting the sharing of data and methods, and offering a variety of computation facilities, including HPC, HTC and Cloud. It focuses on the exploration of big-data visualization techniques aiming at producing comprehensive and interactive views on top of large and heterogeneous provenance data. The same approach is applicable to control-flow and data-flow workflows or to combinations of the two. This flexibility is achieved using the W3C-PROV recommendation as a reference model, especially its workflow oriented profiles such as D-PROV (Messier et al. 2013). Our implementation is based on the provenance records produced by the dispel4py data-intensive processing library (Filgueira et al. 2015). dispel4py is an open-source Python framework for describing abstract stream-based workflows for distributed data-intensive applications, developed during the VERCE project. dispel4py enables scientists to develop their scientific methods and applications on their laptop and then run them at scale on a wide range of e-Infrastructures (Cloud, Cluster, etc.) without making changes. Users can therefore focus on designing their workflows at an abstract level, describing actions, input and output streams, and how they are connected. The dispel4py system then maps these descriptions to the enactment platforms, such as MPI, Storm, multiprocessing. It provides a mechanism which allows users to determine the provenance information to be collected and to analyze it at runtime. For this work we consider alternative visualisation methods for provenance data, from infinite lists and localised interactive graphs, to radial-views. The latter technique has been positively explored in many fields, from text data visualisation to genomics and social networking analysis. Its adoption for provenance has been presented in literature (Borkin et al. 2013) in the context of parent-child relationships across processes, constructed from control-flow information. Computer graphics research has focused on the advantage of this radial distribution of interlinked information and on ways to improve the visual efficiency and tunability of such representations, like the Hierarchical Edge Bundles visualisation method, (Holten et al. 2006), which aims at reducing visual clutter of highly connected structures via the generation of bundles. Our approach explores the potential of the combination of these methods. It serves environments where the size of the provenance collection, coupled with the diversity of the infrastructures and the domain metadata, make the extrapolation of usage trends extremely challenging. Applications of such visualisation systems can engage groups of scientists, data providers and computational engineers, by serving visual snapshots that highlight relationships between an item and its connected processes. We will present examples of comprehensive views on the distribution of processing and data transfers during a workflow's execution in HPC, as well as cross workflows interactions and internal dynamics. The latter in the context of faceted searches on domain metadata values-range. These are obtained from the analysis of real provenance data generated by the processing of seismic traces performed through the VERCE platform.
Potential of knowledge discovery using workflows implemented in the C3Grid
NASA Astrophysics Data System (ADS)
Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny
2013-04-01
With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity alteration of surface cyclones. A specific feature of C3Grid is the flexible Workflow Scheduling Service (WSS) which also allows for automated nightly analysis runs of CT, Stormtrack, etc. with different input parameter sets. The statistical results of these workflows can be accumulated afterwards by a scheduled final analysis step, thereby providing a tool for data intensive analytics for the massive amounts of climate model data accessible through C3Grid. First tests with these automated analysis workflows show promising results to speed up the investigation of high volume modeling data. This example is relevant to the thorough analysis of future changes in storminess in Europe and is just one example of the potential of knowledge discovery using automated workflows implemented in the C3Grid architecture.