A virtual data language and system for scientific workflow management in data grid environments
NASA Astrophysics Data System (ADS)
Zhao, Yong
With advances in scientific instrumentation and simulation, scientific data is growing fast in both size and analysis complexity. So-called Data Grids aim to provide high performance, distributed data analysis infrastructure for data- intensive sciences, where scientists distributed worldwide need to extract information from large collections of data, and to share both data products and the resources needed to produce and store them. However, the description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with "messy" issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed workflow notation called virtual data language, within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful virtual data system that supports distributed execution. The resulting language and system are capable of expressing complex workflows in a simple compact form, enacting those workflows in distributed environments, monitoring and recording the execution processes, and tracing the derivation history of data products. We describe the motivation, design, implementation, and evaluation of the virtual data language and system, and the application of the virtual data paradigm in various science disciplines, including astronomy, cognitive neuroscience.
Provenance Storage, Querying, and Visualization in PBase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo
2015-01-01
We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.
Identifying impact of software dependencies on replicability of biomedical workflows.
Miksa, Tomasz; Rauber, Andreas; Mina, Eleni
2016-12-01
Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ferreira da Silva, R.; Filgueira, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present Asterism, an open source data-intensive framework, which combines the Pegasus and dispel4py workflow systems. Asterism aims to simplify the effort required to develop data-intensive applications that run across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment systems; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/small volumes of data. Asterism's key element is to leverage the strengths of each workflow system: dispel4py allows developing scientific applications locally and then automatically parallelize and scale them on a wide range of HPC infrastructures with no changes to the application's code; Pegasus orchestrates the distributed execution of applications while providing portability, automated data management, recovery, debugging, and monitoring, without users needing to worry about the particulars of the target execution systems. Asterism leverages the level of abstractions provided by each workflow system to describe hybrid workflows where no information about the underlying infrastructure is required beforehand. The feasibility of Asterism has been evaluated using the seismic ambient noise cross-correlation application, a common data-intensive analysis pattern used by many seismologists. The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The Asterism workflow is implemented as a Pegasus workflow composed of two tasks (Phase1 and Phase2), where each phase represents a dispel4py workflow. Pegasus tasks describe the in/output data at a logical level, the data dependency between tasks, and the e-Infrastructures and the execution engine to run each dispel4py workflow. We have instantiated the workflow using data from 1000 stations from the IRIS services, and run it across two heterogeneous resources described as Docker containers: MPI (Container2) and Storm (Container3) clusters (Figure 1). Each dispel4py workflow is mapped to a particular execution engine, and data transfers between resources are automatically handled by Pegasus. Asterism is freely available online at http://github.com/dispel4py/pegasus_dispel4py.
NASA Astrophysics Data System (ADS)
Pan, Tianheng
2018-01-01
In recent years, the combination of workflow management system and Multi-agent technology is a hot research field. The problem of lack of flexibility in workflow management system can be improved by introducing multi-agent collaborative management. The workflow management system adopts distributed structure. It solves the problem that the traditional centralized workflow structure is fragile. In this paper, the agent of Distributed workflow management system is divided according to its function. The execution process of each type of agent is analyzed. The key technologies such as process execution and resource management are analyzed.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
Implementing bioinformatic workflows within the bioextract server
USDA-ARS?s Scientific Manuscript database
Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
Integrating prediction, provenance, and optimization into high energy workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schram, M.; Bansal, V.; Friese, R. D.
We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.
Wireless remote control clinical image workflow: utilizing a PDA for offsite distribution
NASA Astrophysics Data System (ADS)
Liu, Brent J.; Documet, Luis; Documet, Jorge; Huang, H. K.; Muldoon, Jean
2004-04-01
Last year we presented in RSNA an application to perform wireless remote control of PACS image distribution utilizing a handheld device such as a Personal Digital Assistant (PDA). This paper describes the clinical experiences including workflow scenarios of implementing the PDA application to route exams from the clinical PACS archive server to various locations for offsite distribution of clinical PACS exams. By utilizing this remote control application, radiologists can manage image workflow distribution with a single wireless handheld device without impacting their clinical workflow on diagnostic PACS workstations. A PDA application was designed and developed to perform DICOM Query and C-Move requests by a physician from a clinical PACS Archive to a CD-burning device for automatic burning of PACS data for the distribution to offsite. In addition, it was also used for convenient routing of historical PACS exams to the local web server, local workstations, and teleradiology systems. The application was evaluated by radiologists as well as other clinical staff who need to distribute PACS exams to offsite referring physician"s offices and offsite radiologists. An application for image workflow management utilizing wireless technology was implemented in a clinical environment and evaluated. A PDA application was successfully utilized to perform DICOM Query and C-Move requests from the clinical PACS archive to various offsite exam distribution devices. Clinical staff can utilize the PDA to manage image workflow and PACS exam distribution conveniently for offsite consultations by referring physicians and radiologists. This solution allows the radiologist to expand their effectiveness in health care delivery both within the radiology department as well as offisite by improving their clinical workflow.
Building asynchronous geospatial processing workflows with web services
NASA Astrophysics Data System (ADS)
Zhao, Peisheng; Di, Liping; Yu, Genong
2012-02-01
Geoscience research and applications often involve a geospatial processing workflow. This workflow includes a sequence of operations that use a variety of tools to collect, translate, and analyze distributed heterogeneous geospatial data. Asynchronous mechanisms, by which clients initiate a request and then resume their processing without waiting for a response, are very useful for complicated workflows that take a long time to run. Geospatial contents and capabilities are increasingly becoming available online as interoperable Web services. This online availability significantly enhances the ability to use Web service chains to build distributed geospatial processing workflows. This paper focuses on how to orchestrate Web services for implementing asynchronous geospatial processing workflows. The theoretical bases for asynchronous Web services and workflows, including asynchrony patterns and message transmission, are examined to explore different asynchronous approaches to and architecture of workflow code for the support of asynchronous behavior. A sample geospatial processing workflow, issued by the Open Geospatial Consortium (OGC) Web Service, Phase 6 (OWS-6), is provided to illustrate the implementation of asynchronous geospatial processing workflows and the challenges in using Web Services Business Process Execution Language (WS-BPEL) to develop them.
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Scientific Workflows + Provenance = Better (Meta-)Data Management
NASA Astrophysics Data System (ADS)
Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.
2013-12-01
The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
ERIC Educational Resources Information Center
Li, Wenhao
2011-01-01
Distributed workflow technology has been widely used in modern education and e-business systems. Distributed web applications have shown cross-domain and cooperative characteristics to meet the need of current distributed workflow applications. In this paper, the author proposes a dynamic and adaptive scheduling algorithm PCSA (Pre-Calculated…
Observing System Simulation Experiment (OSSE) for the HyspIRI Spectrometer Mission
NASA Technical Reports Server (NTRS)
Turmon, Michael J.; Block, Gary L.; Green, Robert O.; Hua, Hook; Jacob, Joseph C.; Sobel, Harold R.; Springer, Paul L.; Zhang, Qingyuan
2010-01-01
The OSSE software provides an integrated end-to-end environment to simulate an Earth observing system by iteratively running a distributed modeling workflow based on the HyspIRI Mission, including atmospheric radiative transfer, surface albedo effects, detection, and retrieval for agile exploration of the mission design space. The software enables an Observing System Simulation Experiment (OSSE) and can be used for design trade space exploration of science return for proposed instruments by modeling the whole ground truth, sensing, and retrieval chain and to assess retrieval accuracy for a particular instrument and algorithm design. The OSSE in fra struc ture is extensible to future National Research Council (NRC) Decadal Survey concept missions where integrated modeling can improve the fidelity of coupled science and engineering analyses for systematic analysis and science return studies. This software has a distributed architecture that gives it a distinct advantage over other similar efforts. The workflow modeling components are typically legacy computer programs implemented in a variety of programming languages, including MATLAB, Excel, and FORTRAN. Integration of these diverse components is difficult and time-consuming. In order to hide this complexity, each modeling component is wrapped as a Web Service, and each component is able to pass analysis parameterizations, such as reflectance or radiance spectra, on to the next component downstream in the service workflow chain. In this way, the interface to each modeling component becomes uniform and the entire end-to-end workflow can be run using any existing or custom workflow processing engine. The architecture lets users extend workflows as new modeling components become available, chain together the components using any existing or custom workflow processing engine, and distribute them across any Internet-accessible Web Service endpoints. The workflow components can be hosted on any Internet-accessible machine. This has the advantages that the computations can be distributed to make best use of the available computing resources, and each workflow component can be hosted and maintained by their respective domain experts.
Dispel4py: An Open-Source Python library for Data-Intensive Seismology
NASA Astrophysics Data System (ADS)
Filgueira, Rosa; Krause, Amrey; Spinuso, Alessandro; Klampanos, Iraklis; Danecek, Peter; Atkinson, Malcolm
2015-04-01
Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Garcia Blas, Javier; Isaila, Florin
This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.
Visualisation methods for large provenance collections in data-intensive collaborative platforms
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Fligueira, Rosa; Atkinson, Malcolm; Gemuend, Andre
2016-04-01
This work investigates improving the methods of visually representing provenance information in the context of modern data-driven scientific research. It explores scenarios where data-intensive workflows systems are serving communities of researchers within collaborative environments, supporting the sharing of data and methods, and offering a variety of computation facilities, including HPC, HTC and Cloud. It focuses on the exploration of big-data visualization techniques aiming at producing comprehensive and interactive views on top of large and heterogeneous provenance data. The same approach is applicable to control-flow and data-flow workflows or to combinations of the two. This flexibility is achieved using the W3C-PROV recommendation as a reference model, especially its workflow oriented profiles such as D-PROV (Messier et al. 2013). Our implementation is based on the provenance records produced by the dispel4py data-intensive processing library (Filgueira et al. 2015). dispel4py is an open-source Python framework for describing abstract stream-based workflows for distributed data-intensive applications, developed during the VERCE project. dispel4py enables scientists to develop their scientific methods and applications on their laptop and then run them at scale on a wide range of e-Infrastructures (Cloud, Cluster, etc.) without making changes. Users can therefore focus on designing their workflows at an abstract level, describing actions, input and output streams, and how they are connected. The dispel4py system then maps these descriptions to the enactment platforms, such as MPI, Storm, multiprocessing. It provides a mechanism which allows users to determine the provenance information to be collected and to analyze it at runtime. For this work we consider alternative visualisation methods for provenance data, from infinite lists and localised interactive graphs, to radial-views. The latter technique has been positively explored in many fields, from text data visualisation to genomics and social networking analysis. Its adoption for provenance has been presented in literature (Borkin et al. 2013) in the context of parent-child relationships across processes, constructed from control-flow information. Computer graphics research has focused on the advantage of this radial distribution of interlinked information and on ways to improve the visual efficiency and tunability of such representations, like the Hierarchical Edge Bundles visualisation method, (Holten et al. 2006), which aims at reducing visual clutter of highly connected structures via the generation of bundles. Our approach explores the potential of the combination of these methods. It serves environments where the size of the provenance collection, coupled with the diversity of the infrastructures and the domain metadata, make the extrapolation of usage trends extremely challenging. Applications of such visualisation systems can engage groups of scientists, data providers and computational engineers, by serving visual snapshots that highlight relationships between an item and its connected processes. We will present examples of comprehensive views on the distribution of processing and data transfers during a workflow's execution in HPC, as well as cross workflows interactions and internal dynamics. The latter in the context of faceted searches on domain metadata values-range. These are obtained from the analysis of real provenance data generated by the processing of seismic traces performed through the VERCE platform.
NASA Astrophysics Data System (ADS)
Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.
Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.
Visualization and Analysis for Near-Real-Time Decision Making in Distributed Workflows
Pugmire, David; Kress, James; Choi, Jong; ...
2016-08-04
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
Vallefuoco, L; Sorrentino, R; Spalletti Cernia, D; Colucci, G; Portella, G
2012-12-01
The cobas p 630, a fully automated pre-analytical instrument for primary tube handling recently introduced to complete the Cobas(®) TaqMan systems portfolio, was evaluated in conjunction with: the COBAS(®) AmpliPrep/COBAS(®) TaqMan HBV Test, v2.0, COBAS(®) AmpliPrep/COBAS(®) TaqMan HCV Test, v1.0 and COBAS(®) AmpliPrep/COBAS(®) TaqMan HIV Test, v2.0. The instrument performance in transferring samples from primary to secondary tubes, its impact in improving COBAS(®) AmpliPrep/COBAS(®) TaqMan workflow and hands-on reduction and the risk of possible cross-contamination were assessed. Samples from 42 HBsAg positive, 42 HCV and 42 HIV antibody (Ab) positive patients as well as 21 healthy blood donors were processed with or without automated primary tubes. HIV, HCV and HBsAg positive samples showed a correlation index of 0.999, 0.987 and of 0.994, respectively. To assess for cross-contamination, high titer HBV DNA positive samples, HCV RNA and HIV RNA positive samples were distributed in the cobas p 630 in alternate tube positions, adjacent to negative control samples within the same rack. None of the healthy donor samples showed any reactivity. Based on these results, the cobas p 630 can improve workflow and sample tracing in laboratories performing molecular tests, and reduce turnaround time, errors, and risks. Copyright © 2012 Elsevier B.V. All rights reserved.
RESTFul based heterogeneous Geoprocessing workflow interoperation for Sensor Web Service
NASA Astrophysics Data System (ADS)
Yang, Chao; Chen, Nengcheng; Di, Liping
2012-10-01
Advanced sensors on board satellites offer detailed Earth observations. A workflow is one approach for designing, implementing and constructing a flexible and live link between these sensors' resources and users. It can coordinate, organize and aggregate the distributed sensor Web services to meet the requirement of a complex Earth observation scenario. A RESTFul based workflow interoperation method is proposed to integrate heterogeneous workflows into an interoperable unit. The Atom protocols are applied to describe and manage workflow resources. The XML Process Definition Language (XPDL) and Business Process Execution Language (BPEL) workflow standards are applied to structure a workflow that accesses sensor information and one that processes it separately. Then, a scenario for nitrogen dioxide (NO2) from a volcanic eruption is used to investigate the feasibility of the proposed method. The RESTFul based workflows interoperation system can describe, publish, discover, access and coordinate heterogeneous Geoprocessing workflows.
Design and implementation of workflow engine for service-oriented architecture
NASA Astrophysics Data System (ADS)
Peng, Shuqing; Duan, Huining; Chen, Deyun
2009-04-01
As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Chase Qishi; Zhu, Michelle Mengxia
The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
Virtual Sensor Web Architecture
NASA Astrophysics Data System (ADS)
Bose, P.; Zimdars, A.; Hurlburt, N.; Doug, S.
2006-12-01
NASA envisions the development of smart sensor webs, intelligent and integrated observation network that harness distributed sensing assets, their associated continuous and complex data sets, and predictive observation processing mechanisms for timely, collaborative hazard mitigation and enhanced science productivity and reliability. This paper presents Virtual Sensor Web Infrastructure for Collaborative Science (VSICS) Architecture for sustained coordination of (numerical and distributed) model-based processing, closed-loop resource allocation, and observation planning. VSICS's key ideas include i) rich descriptions of sensors as services based on semantic markup languages like OWL and SensorML; ii) service-oriented workflow composition and repair for simple and ensemble models; event-driven workflow execution based on event-based and distributed workflow management mechanisms; and iii) development of autonomous model interaction management capabilities providing closed-loop control of collection resources driven by competing targeted observation needs. We present results from initial work on collaborative science processing involving distributed services (COSEC framework) that is being extended to create VSICS.
Evaluating progressive-rendering algorithms in appearance design tasks.
Jiawei Ou; Karlik, Ondrej; Křivánek, Jaroslav; Pellacini, Fabio
2013-01-01
Progressive rendering is becoming a popular alternative to precomputational approaches to appearance design. However, progressive algorithms create images exhibiting visual artifacts at early stages. A user study investigated these artifacts' effects on user performance in appearance design tasks. Novice and expert subjects performed lighting and material editing tasks with four algorithms: random path tracing, quasirandom path tracing, progressive photon mapping, and virtual-point-light rendering. Both the novices and experts strongly preferred path tracing to progressive photon mapping and virtual-point-light rendering. None of the participants preferred random path tracing to quasirandom path tracing or vice versa; the same situation held between progressive photon mapping and virtual-point-light rendering. The user workflow didn’t differ significantly with the four algorithms. The Web Extras include a video showing how four progressive-rendering algorithms converged (at http://youtu.be/ck-Gevl1e9s), the source code used, and other supplementary materials.
Integrating Remote and Social Sensing Data for a Scenario on Secure Societies in Big Data Platform
NASA Astrophysics Data System (ADS)
Albani, Sergio; Lazzarini, Michele; Koubarakis, Manolis; Taniskidou, Efi Karra; Papadakis, George; Karkaletsis, Vangelis; Giannakopoulos, George
2016-08-01
In the framework of the Horizon 2020 project BigDataEurope (Integrating Big Data, Software & Communities for Addressing Europe's Societal Challenges), a pilot for the Secure Societies Societal Challenge was designed considering the requirements coming from relevant stakeholders. The pilot is focusing on the integration in a Big Data platform of data coming from remote and social sensing.The information on land changes coming from the Copernicus Sentinel 1A sensor (Change Detection workflow) is integrated with information coming from selected Twitter and news agencies accounts (Event Detection workflow) in order to provide the user with multiple sources of information.The Change Detection workflow implements a processing chain in a distributed parallel manner, exploiting the Big Data capabilities in place; the Event Detection workflow implements parallel and distributed social media and news agencies monitoring as well as suitable mechanisms to detect and geo-annotate the related events.
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi; ...
2016-07-21
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrix, Valerie; Fox, James; Ghoshal, Devarshi
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterativemore » workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechanisms) that affect the performance of scientific workflows on HPC systems.« less
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997
NASA Astrophysics Data System (ADS)
Tomlin, M. C.; Jenkyns, R.
2015-12-01
Ocean Networks Canada (ONC) collects data from observatories in the northeast Pacific, Salish Sea, Arctic Ocean, Atlantic Ocean, and land-based sites in British Columbia. Data are streamed, collected autonomously, or transmitted via satellite from a variety of instruments. The Software Engineering group at ONC develops and maintains Oceans 2.0, an in-house software system that acquires and archives data from sensors, and makes data available to scientists, the public, government and non-government agencies. The Oceans 2.0 workflow tool was developed by ONC to manage a large volume of tasks and processes required for instrument installation, recovery and maintenance activities. Since 2013, the workflow tool has supported 70 expeditions and grown to include 30 different workflow processes for the increasing complexity of infrastructures at ONC. The workflow tool strives to keep pace with an increasing heterogeneity of sensors, connections and environments by supporting versioning of existing workflows, and allowing the creation of new processes and tasks. Despite challenges in training and gaining mutual support from multidisciplinary teams, the workflow tool has become invaluable in project management in an innovative setting. It provides a collective place to contribute to ONC's diverse projects and expeditions and encourages more repeatable processes, while promoting interactions between the multidisciplinary teams who manage various aspects of instrument development and the data they produce. The workflow tool inspires documentation of terminologies and procedures, and effectively links to other tools at ONC such as JIRA, Alfresco and Wiki. Motivated by growing sensor schemes, modes of collecting data, archiving, and data distribution at ONC, the workflow tool ensures that infrastructure is managed completely from instrument purchase to data distribution. It integrates all areas of expertise and helps fulfill ONC's mandate to offer quality data to users.
NASA Astrophysics Data System (ADS)
Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut
2017-04-01
Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
2011-01-01
Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.
Cieślik, Marcin; Mura, Cameron
2011-02-25
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics
Astakhov, V.; Bandrowski, A.; Gupta, A.; Kulungowski, A.W.; Grethe, J.S.; Bouwer, J.; Molina, T.; Rowley, V.; Penticoff, S.; Terada, M.; Wong, W.; Hakozaki, H.; Kwon, O.; Martone, M.E.; Ellisman, M.
2016-01-01
We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions. PMID:28479932
SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog
NASA Astrophysics Data System (ADS)
Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely
2014-05-01
Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.
An access control model with high security for distributed workflow and real-time application
NASA Astrophysics Data System (ADS)
Han, Ruo-Fei; Wang, Hou-Xiang
2007-11-01
The traditional mandatory access control policy (MAC) is regarded as a policy with strict regulation and poor flexibility. The security policy of MAC is so compelling that few information systems would adopt it at the cost of facility, except some particular cases with high security requirement as military or government application. However, with the increasing requirement for flexibility, even some access control systems in military application have switched to role-based access control (RBAC) which is well known as flexible. Though RBAC can meet the demands for flexibility but it is weak in dynamic authorization and consequently can not fit well in the workflow management systems. The task-role-based access control (T-RBAC) is then introduced to solve the problem. It combines both the advantages of RBAC and task-based access control (TBAC) which uses task to manage permissions dynamically. To satisfy the requirement of system which is distributed, well defined with workflow process and critically for time accuracy, this paper will analyze the spirit of MAC, introduce it into the improved T&RBAC model which is based on T-RBAC. At last, a conceptual task-role-based access control model with high security for distributed workflow and real-time application (A_T&RBAC) is built, and its performance is simply analyzed.
A Workflow to Investigate Exposure and Pharmacokinetic ...
Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowledge on exposure, along with absorption, distribution, metabolism, and excretion (ADME) properties of chemicals.Objectives: We developed a conceptual workflow to examine exposure and ADME properties in relation to an MIE. The utility of this workflow was evaluated using a previously established AOP, acetylcholinesterase (AChE) inhibition.Methods: Thirty chemicals found to inhibit human AChE in the ToxCast™ assay were examined with respect to their exposure, absorption potential, and ability to cross the blood–brain barrier (BBB). Structures of active chemicals were compared against structures of 1,029 inactive chemicals to detect possible parent compounds that might have active metabolites.Results: Application of the workflow screened 10 “low-priority” chemicals of 30 active chemicals. Fifty-two of the 1,029 inactive chemicals exhibited a similarity threshold of ≥ 75% with their nearest active neighbors. Of these 52 compounds, 30 were excluded due to poor absorption or distribution. The remaining 22 compounds may inhibit AChE in vivo either directly or as a result of metabolic activation.Conclusions: The incorporation of exposure and ADME properties into the conceptual workflow e
Common Data Models and Efficient Reproducible Workflows for Distributed Ocean Model Skill Assessment
NASA Astrophysics Data System (ADS)
Signell, R. P.; Snowden, D. P.; Howlett, E.; Fernandes, F. A.
2014-12-01
Model skill assessment requires discovery, access, analysis, and visualization of information from both sensors and models, and traditionally has been possible only by a few experts. The US Integrated Ocean Observing System (US-IOOS) consists of 17 Federal Agencies and 11 Regional Associations that produce data from various sensors and numerical models; exactly the information required for model skill assessment. US-IOOS is seeking to develop documented skill assessment workflows that are standardized, efficient, and reproducible so that a much wider community can participate in the use and assessment of model results. Standardization requires common data models for observational and model data. US-IOOS relies on the CF Conventions for observations and structured grid data, and on the UGRID Conventions for unstructured (e.g. triangular) grid data. This allows applications to obtain only the data they require in a uniform and parsimonious way using web services: OPeNDAP for model output and OGC Sensor Observation Service (SOS) for observed data. Reproducibility is enabled with IPython Notebooks shared on GitHub (http://github.com/ioos). These capture the entire skill assessment workflow, including user input, search, access, analysis, and visualization, ensuring that workflows are self-documenting and reproducible by anyone, using free software. Python packages for common data models are Pyugrid and the British Met Office Iris package. Python packages required to run the workflows (pyugrid, pyoos, and the British Met Office Iris package) are also available on GitHub and on Binstar.org so that users can run scenarios using the free Anaconda Python distribution. Hosted services such as Wakari enable anyone to reproduce these workflows for free, without installing any software locally, using just their web browser. We are also experimenting with Wakari Enterprise, which allows multi-user access from a web browser to an IPython Server running where large quantities of model output reside, increasing the efficiency. The open development and distribution of these workflows, and the software on which they depend, is an educational resource for those new to the field and a center of focus where practitioners can contribute new software and ideas.
Automated quality control in a file-based broadcasting workflow
NASA Astrophysics Data System (ADS)
Zhang, Lina
2014-04-01
Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.
NASA Astrophysics Data System (ADS)
Vho, Alice; Bistacchi, Andrea
2015-04-01
A quantitative analysis of fault-rock distribution is of paramount importance for studies of fault zone architecture, fault and earthquake mechanics, and fluid circulation along faults at depth. Here we present a semi-automatic workflow for fault-rock mapping on a Digital Outcrop Model (DOM). This workflow has been developed on a real case of study: the strike-slip Gole Larghe Fault Zone (GLFZ). It consists of a fault zone exhumed from ca. 10 km depth, hosted in granitoid rocks of Adamello batholith (Italian Southern Alps). Individual seismogenic slip surfaces generally show green cataclasites (cemented by the precipitation of epidote and K-feldspar from hydrothermal fluids) and more or less well preserved pseudotachylytes (black when well preserved, greenish to white when altered). First of all, a digital model for the outcrop is reconstructed with photogrammetric techniques, using a large number of high resolution digital photographs, processed with VisualSFM software. By using high resolution photographs the DOM can have a much higher resolution than with LIDAR surveys, up to 0.2 mm/pixel. Then, image processing is performed to map the fault-rock distribution with the ImageJ-Fiji package. Green cataclasites and epidote/K-feldspar veins can be quite easily separated from the host rock (tonalite) using spectral analysis. Particularly, band ratio and principal component analysis have been tested successfully. The mapping of black pseudotachylyte veins is more tricky because the differences between the pseudotachylyte and biotite spectral signature are not appreciable. For this reason we have tested different morphological processing tools aimed at identifying (and subtracting) the tiny biotite grains. We propose a solution based on binary images involving a combination of size and circularity thresholds. Comparing the results with manually segmented images, we noticed that major problems occur only when pseudotachylyte veins are very thin and discontinuous. After having tested and refined the image analysis processing for some typical images, we have recorded a macro with ImageJ-Fiji allowing to process all the images for a given DOM. As a result, the three different types of rocks can be semi-automatically mapped on large DOMs using a simple and efficient procedure. This allows to develop quantitative analyses of fault rock distribution and thickness, fault trace roughness/curvature and length, fault zone architecture, and alteration halos due to hydrothermal fluid-rock interaction. To improve our workflow, additional or different morphological operators could be integrated in our procedure to yield a better resolution on small and thin pseudotachylyte veins (e.g. perimeter/area ratio).
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi
2018-03-19
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life
Thessen, Anne E.; Parr, Cynthia Sims
2014-01-01
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics. PMID:24594988
Moenninghoff, Christoph; Umutlu, Lale; Kloeters, Christian; Ringelstein, Adrian; Ladd, Mark E; Sombetzki, Antje; Lauenstein, Thomas C; Forsting, Michael; Schlamann, Marc
2013-06-01
Workflow efficiency and workload of radiological technologists (RTs) were compared in head examinations performed with two 1.5 T magnetic resonance (MR) scanners equipped with or without an automated user interface called "day optimizing throughput" (Dot) workflow engine. Thirty-four patients with known intracranial pathology were examined with a 1.5 T MR scanner with Dot workflow engine (Siemens MAGNETOM Aera) and with a 1.5 T MR scanner with conventional user interface (Siemens MAGNETOM Avanto) using four standardized examination protocols. The elapsed time for all necessary work steps, which were performed by 11 RTs within the total examination time, was compared for each examination at both MR scanners. The RTs evaluated the user-friendliness of both scanners by a questionnaire. Normality of distribution was checked for all continuous variables by use of the Shapiro-Wilk test. Normally distributed variables were analyzed by Student's paired t-test, otherwise Wilcoxon signed-rank test was used to compare means. Total examination time of MR examinations performed with Dot engine was reduced from 24:53 to 20:01 minutes (P < .001) and the necessary RT intervention decreased by 61% (P < .001). The Dot engine's automated choice of MR protocols was significantly better assessed by the RTs than the conventional user interface (P = .001). According to this preliminary study, the Dot workflow engine is a time-saving user assistance software, which decreases the RTs' effort significantly and may help to automate neuroradiological examinations for a higher workflow efficiency. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.
Biowep: a workflow enactment portal for bioinformatics applications.
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-03-08
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Biowep: a workflow enactment portal for bioinformatics applications
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
2007-01-01
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO. PMID:17430563
2007-09-01
Motion URL: http://www.blackberry.com/products/blackberry/index.shtml Software Name: Bricolage Company: Bricolage URL: http://www.bricolage.cc...Workflow Customizable control over editorial content. Bricolage Bricolage Feature Description Software Company Workflow Allows development...content for Nuxeo Collaborative Portal projects. Nuxeo Workspace Add, edit, delete, content through web interface. Bricolage Bricolage
Zugaj, D; Chenet, A; Petit, L; Vaglio, J; Pascual, T; Piketty, C; Bourdes, V
2018-02-04
Currently, imaging technologies that can accurately assess or provide surrogate markers of the human cutaneous microvessel network are limited. Dynamic optical coherence tomography (D-OCT) allows the detection of blood flow in vivo and visualization of the skin microvasculature. However, image processing is necessary to correct images, filter artifacts, and exclude irrelevant signals. The objective of this study was to develop a novel image processing workflow to enhance the technical capabilities of D-OCT. Single-center, vehicle-controlled study including healthy volunteers aged 18-50 years. A capsaicin solution was applied topically on the subject's forearm to induce local inflammation. Measurements of capsaicin-induced increase in dermal blood flow, within the region of interest, were performed by laser Doppler imaging (LDI) (reference method) and D-OCT. Sixteen subjects were enrolled. A good correlation was shown between D-OCT and LDI, using the image processing workflow. Therefore, D-OCT offers an easy-to-use alternative to LDI, with good repeatability, new robust morphological features (dermal-epidermal junction localization), and quantification of the distribution of vessel size and changes in this distribution induced by capsaicin. The visualization of the vessel network was improved through bloc filtering and artifact removal. Moreover, the assessment of vessel size distribution allows a fine analysis of the vascular patterns. The newly developed image processing workflow enhances the technical capabilities of D-OCT for the accurate detection and characterization of microcirculation in the skin. A direct clinical application of this image processing workflow is the quantification of the effect of topical treatment on skin vascularization. © 2018 The Authors. Skin Research and Technology Published by John Wiley & Sons Ltd.
Extension of specification language for soundness and completeness of service workflow
NASA Astrophysics Data System (ADS)
Viriyasitavat, Wattana; Xu, Li Da; Bi, Zhuming; Sapsomboon, Assadaporn
2018-05-01
A Service Workflow is an aggregation of distributed services to fulfill specific functionalities. With ever increasing available services, the methodologies for the selections of the services against the given requirements become main research subjects in multiple disciplines. A few of researchers have contributed to the formal specification languages and the methods for model checking; however, existing methods have the difficulties to tackle with the complexity of workflow compositions. In this paper, we propose to formalize the specification language to reduce the complexity of the workflow composition. To this end, we extend a specification language with the consideration of formal logic, so that some effective theorems can be derived for the verification of syntax, semantics, and inference rules in the workflow composition. The logic-based approach automates compliance checking effectively. The Service Workflow Specification (SWSpec) has been extended and formulated, and the soundness, completeness, and consistency of SWSpec applications have been verified; note that a logic-based SWSpec is mandatory for the development of model checking. The application of the proposed SWSpec has been demonstrated by the examples with the addressed soundness, completeness, and consistency.
Wang, Ximing; Liu, Brent J; Martinez, Clarisa; Zhang, Xuejun; Winstein, Carolee J
2015-01-01
Imaging based clinical trials can benefit from a solution to efficiently collect, analyze, and distribute multimedia data at various stages within the workflow. Currently, the data management needs of these trials are typically addressed with custom-built systems. However, software development of the custom- built systems for versatile workflows can be resource-consuming. To address these challenges, we present a system with a workflow engine for imaging based clinical trials. The system enables a project coordinator to build a data collection and management system specifically related to study protocol workflow without programming. Web Access to DICOM Objects (WADO) module with novel features is integrated to further facilitate imaging related study. The system was initially evaluated by an imaging based rehabilitation clinical trial. The evaluation shows that the cost of the development of system can be much reduced compared to the custom-built system. By providing a solution to customize a system and automate the workflow, the system will save on development time and reduce errors especially for imaging clinical trials. PMID:25870169
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pugmire, David; Kress, James; Choi, Jong
Data driven science is becoming increasingly more common, complex, and is placing tremendous stresses on visualization and analysis frameworks. Data sources producing 10GB per second (and more) are becoming increasingly commonplace in both simulation, sensor and experimental sciences. These data sources, which are often distributed around the world, must be analyzed by teams of scientists that are also distributed. Enabling scientists to view, query and interact with such large volumes of data in near-real-time requires a rich fusion of visualization and analysis techniques, middleware and workflow systems. Here, this paper discusses initial research into visualization and analysis of distributed datamore » workflows that enables scientists to make near-real-time decisions of large volumes of time varying data.« less
ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows
NASA Technical Reports Server (NTRS)
McCann, Karen M.; Yarrow, Maurice; DeVivo, Adrian; Mehrotra, Piyush
2004-01-01
With the advent of grid technologies, scientists and engineers are building more and more complex applications to utilize distributed grid resources. The core grid services provide a path for accessing and utilizing these resources in a secure and seamless fashion. However what the scientists need is an environment that will allow them to specify their application runs at a high organizational level, and then support efficient execution across any given set or sets of resources. We have been designing and implementing ScyFlow, a dual-interface architecture (both GUT and APT) that addresses this problem. The scientist/user specifies the application tasks along with the necessary control and data flow, and monitors and manages the execution of the resulting workflow across the distributed resources. In this paper, we utilize two scenarios to provide the details of the two modules of the project, the visual editor and the runtime workflow engine.
Solutions for Mining Distributed Scientific Data
NASA Astrophysics Data System (ADS)
Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.
2007-12-01
Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
Flexible workflow sharing and execution services for e-scientists
NASA Astrophysics Data System (ADS)
Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely
2013-04-01
The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
Letzel, Thomas; Bayer, Anne; Schulz, Wolfgang; Heermann, Alexandra; Lucke, Thomas; Greco, Giorgia; Grosse, Sylvia; Schüssler, Walter; Sengl, Manfred; Letzel, Marion
2015-10-01
A large number of anthropogenic trace contaminants such as pharmaceuticals, their human metabolites and further transformation products (TPs) enter wastewater treatment plants on a daily basis. A mixture of known, expected, and unknown molecules are discharged into the receiving aquatic environment because only partial elimination occurs for many of these chemicals during physical, biological and chemical treatment processes. In this study, an array of LC-MS methods from three collaborating laboratories was applied to detect and identify anthropogenic trace contaminants and their TPs in different waters. Starting with theoretical predictions of TPs, an efficient workflow using the combination of target, suspected-target and non-target strategies for the identification of these TPs in the environment was developed. These techniques and strategies were applied to study anti-hypertensive drugs from the sartan group (i.e., candesartan, eprosartan, irbesartan, olmesartan, and valsartan). Degradation experiments were performed in lab-scale wastewater treatment plants, and a screening workflow including an inter-laboratory approach was used for the identification of transformation products in the effluent samples. Subsequently, newly identified compounds were successfully analyzed in effluents of real wastewater treatment plants and river waters. Copyright © 2015 Elsevier Ltd. All rights reserved.
A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
Mathew, Cherian; Obst, Matthias; Vicario, Saverio; Haines, Robert; Williams, Alan R.; de Jong, Yde; Goble, Carole
2014-01-01
Abstract The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users. PMID:25535486
Towards Exascale Seismic Imaging and Inversion
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.
2015-12-01
Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Workflow management in large distributed systems
NASA Astrophysics Data System (ADS)
Legrand, I.; Newman, H.; Voicu, R.; Dobre, C.; Grigoras, C.
2011-12-01
The MonALISA (Monitoring Agents using a Large Integrated Services Architecture) framework provides a distributed service system capable of controlling and optimizing large-scale, data-intensive applications. An essential part of managing large-scale, distributed data-processing facilities is a monitoring system for computing facilities, storage, networks, and the very large number of applications running on these systems in near realtime. All this monitoring information gathered for all the subsystems is essential for developing the required higher-level services—the components that provide decision support and some degree of automated decisions—and for maintaining and optimizing workflow in large-scale distributed systems. These management and global optimization functions are performed by higher-level agent-based services. We present several applications of MonALISA's higher-level services including optimized dynamic routing, control, data-transfer scheduling, distributed job scheduling, dynamic allocation of storage resource to running jobs and automated management of remote services among a large set of grid facilities.
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.
2013-12-01
As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
NASA Astrophysics Data System (ADS)
Peer, Regina; Peer, Siegfried; Sander, Heike; Marsolek, Ingo; Koller, Wolfgang; Pappert, Dirk; Hierholzer, Johannes
2002-05-01
If new technology is introduced into medical practice it must prove to make a difference. However traditional approaches of outcome analysis failed to show a direct benefit of PACS on patient care and economical benefits are still in debate. A participatory process analysis was performed to compare workflow in a film based hospital and a PACS environment. This included direct observation of work processes, interview of involved staff, structural analysis and discussion of observations with staff members. After definition of common structures strong and weak workflow steps were evaluated. With a common workflow structure in both hospitals, benefits of PACS were revealed in workflow steps related to image reporting with simultaneous image access for ICU-physicians and radiologists, archiving of images as well as image and report distribution. However PACS alone is not able to cover the complete process of 'radiography for intensive care' from ordering of an image till provision of the final product equals image + report. Interference of electronic workflow with analogue process steps such as paper based ordering reduces the potential benefits of PACS. In this regard workflow modeling proved to be very helpful for the evaluation of complex work processes linking radiology and the ICU.
Detecting distant homologies on protozoans metabolic pathways using scientific workflows.
da Cruz, Sérgio Manuel Serra; Batista, Vanessa; Silva, Edno; Tosta, Frederico; Vilela, Clarissa; Cuadrat, Rafael; Tschoeke, Diogo; Dávila, Alberto M R; Campos, Maria Luiza Machado; Mattoso, Marta
2010-01-01
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
An Adaptable Seismic Data Format for Modern Scientific Workflows
NASA Astrophysics Data System (ADS)
Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.
2013-12-01
Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.
Dynamic reusable workflows for ocean science
Signell, Richard; Fernandez, Filipe; Wilcox, Kyle
2016-01-01
Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog search and data access make it now possible to create catalog-driven workflows that automate — end-to-end — data search, analysis and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused and adapted with ease. Here we describe a workflow developed within the US Integrated Ocean Observing System (IOOS) which automates the skill-assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event. A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general. These workflows adapt as new data enters the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected. While the example is specific, the approach is general, and we hope to see increased use of dynamic notebooks across the geoscience domains.
Towards a Unified Architecture for Data-Intensive Seismology in VERCE
NASA Astrophysics Data System (ADS)
Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.
2013-12-01
Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
Wong, Stephen T C; Tjandra, Donny; Wang, Huili; Shen, Weimin
2003-09-01
Few information systems today offer a flexible means to define and manage the automated part of radiology processes, which provide clinical imaging services for the entire healthcare organization. Even fewer of them provide a coherent architecture that can easily cope with heterogeneity and inevitable local adaptation of applications and can integrate clinical and administrative information to aid better clinical, operational, and business decisions. We describe an innovative enterprise architecture of image information management systems to fill the needs. Such a system is based on the interplay of production workflow management, distributed object computing, Java and Web techniques, and in-depth domain knowledge in radiology operations. Our design adapts the approach of "4+1" architectural view. In this new architecture, PACS and RIS become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, the workflow-enabled digital radiology system would provide powerful query and statistical functions for managing resources and improving productivity. This paper will potentially lead to a new direction of image information management. We illustrate the innovative design with examples taken from an implemented system.
Krämer, Lisa; Jäger, Christian; Trezzi, Jean-Pierre; Jacobs, Doris M; Hiller, Karsten
2018-02-14
Currently, changes in metabolic fluxes following consumption of stable isotope-enriched foods are usually limited to the analysis of postprandial kinetics of glucose. Kinetic information on a larger diversity of metabolites is often lacking, mainly due to the marginal percentage of fully isotopically enriched plant material in the administered food product, and hence, an even weaker 13 C enrichment in downstream plasma metabolites. Therefore, we developed an analytical workflow to determine weak 13 C enrichments of diverse plasma metabolites with conventional gas chromatography-mass spectrometry (GC-MS). The limit of quantification was increased by optimizing (1) the metabolite extraction from plasma, (2) the GC-MS measurement, and (3) most importantly, the computational data processing. We applied our workflow to study the catabolic dynamics of 13 C-enriched wheat bread in three human subjects. For that purpose, we collected time-resolved human plasma samples at 16 timepoints after the consumption of 13 C-labeled bread and quantified 13 C enrichment of 12 metabolites (glucose, lactate, alanine, glycine, serine, citrate, glutamate, glutamine, valine, isoleucine, tyrosine, and threonine). Based on isotopomer specific analysis, we were able to distinguish catabolic profiles of starch and protein hydrolysis. More generally, our study highlights that conventional GC-MS equipment is sufficient to detect isotope traces below 1% if an appropriate data processing is integrated.
Krämer, Lisa; Jäger, Christian; Jacobs, Doris M.; Hiller, Karsten
2018-01-01
Currently, changes in metabolic fluxes following consumption of stable isotope-enriched foods are usually limited to the analysis of postprandial kinetics of glucose. Kinetic information on a larger diversity of metabolites is often lacking, mainly due to the marginal percentage of fully isotopically enriched plant material in the administered food product, and hence, an even weaker 13C enrichment in downstream plasma metabolites. Therefore, we developed an analytical workflow to determine weak 13C enrichments of diverse plasma metabolites with conventional gas chromatography-mass spectrometry (GC-MS). The limit of quantification was increased by optimizing (1) the metabolite extraction from plasma, (2) the GC-MS measurement, and (3) most importantly, the computational data processing. We applied our workflow to study the catabolic dynamics of 13C-enriched wheat bread in three human subjects. For that purpose, we collected time-resolved human plasma samples at 16 timepoints after the consumption of 13C-labeled bread and quantified 13C enrichment of 12 metabolites (glucose, lactate, alanine, glycine, serine, citrate, glutamate, glutamine, valine, isoleucine, tyrosine, and threonine). Based on isotopomer specific analysis, we were able to distinguish catabolic profiles of starch and protein hydrolysis. More generally, our study highlights that conventional GC-MS equipment is sufficient to detect isotope traces below 1% if an appropriate data processing is integrated. PMID:29443915
It's All About the Data: Workflow Systems and Weather
NASA Astrophysics Data System (ADS)
Plale, B.
2009-05-01
Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
The View from a Few Hundred Feet : A New Transparent and Integrated Workflow for UAV-collected Data
NASA Astrophysics Data System (ADS)
Peterson, F. S.; Barbieri, L.; Wyngaard, J.
2015-12-01
Unmanned Aerial Vehicles (UAVs) allow scientists and civilians to monitor earth and atmospheric conditions in remote locations. To keep up with the rapid evolution of UAV technology, data workflows must also be flexible, integrated, and introspective. Here, we present our data workflow for a project to assess the feasibility of detecting threshold levels of methane, carbon-dioxide, and other aerosols by mounting consumer-grade gas analysis sensors on UAV's. Particularly, we highlight our use of Project Jupyter, a set of open-source software tools and documentation designed for developing "collaborative narratives" around scientific workflows. By embracing the GitHub-backed, multi-language systems available in Project Jupyter, we enable interaction and exploratory computation while simultaneously embracing distributed version control. Additionally, the transparency of this method builds trust with civilians and decision-makers and leverages collaboration and communication to resolve problems. The goal of this presentation is to provide a generic data workflow for scientific inquiries involving UAVs and to invite the participation of the AGU community in its improvement and curation.
NASA Astrophysics Data System (ADS)
Hermans, Thomas; Nguyen, Frédéric; Caers, Jef
2015-07-01
In inverse problems, investigating uncertainty in the posterior distribution of model parameters is as important as matching data. In recent years, most efforts have focused on techniques to sample the posterior distribution with reasonable computational costs. Within a Bayesian context, this posterior depends on the prior distribution. However, most of the studies ignore modeling the prior with realistic geological uncertainty. In this paper, we propose a workflow inspired by a Popper-Bayes philosophy that data should first be used to falsify models, then only be considered for matching. We propose a workflow consisting of three steps: (1) in defining the prior, we interpret multiple alternative geological scenarios from literature (architecture of facies) and site-specific data (proportions of facies). Prior spatial uncertainty is modeled using multiple-point geostatistics, where each scenario is defined using a training image. (2) We validate these prior geological scenarios by simulating electrical resistivity tomography (ERT) data on realizations of each scenario and comparing them to field ERT in a lower dimensional space. In this second step, the idea is to probabilistically falsify scenarios with ERT, meaning that scenarios which are incompatible receive an updated probability of zero while compatible scenarios receive a nonzero updated belief. (3) We constrain the hydrogeological model with hydraulic head and ERT using a stochastic search method. The workflow is applied to a synthetic and a field case studies in an alluvial aquifer. This study highlights the importance of considering and estimating prior uncertainty (without data) through a process of probabilistic falsification.
Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.; ...
2018-02-06
Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.
Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
Web-Accessible Scientific Workflow System for Performance Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roelof Versteeg; Roelof Versteeg; Trevor Rowe
2006-03-01
We describe the design and implementation of a web accessible scientific workflow system for environmental monitoring. This workflow environment integrates distributed, automated data acquisition with server side data management and information visualization through flexible browser based data access tools. Component technologies include a rich browser-based client (using dynamic Javascript and HTML/CSS) for data selection, a back-end server which uses PHP for data processing, user management, and result delivery, and third party applications which are invoked by the back-end using webservices. This environment allows for reproducible, transparent result generation by a diverse user base. It has been implemented for several monitoringmore » systems with different degrees of complexity.« less
NASA Astrophysics Data System (ADS)
Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.
2017-12-01
This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.
Patterson, Emily S.; Lowry, Svetlana Z.; Ramaiah, Mala; Gibbons, Michael C.; Brick, David; Calco, Robert; Matton, Greg; Miller, Anne; Makar, Ellen; Ferrer, Jorge A.
2015-01-01
Introduction: Human factors workflow analyses in healthcare settings prior to technology implemented are recommended to improve workflow in ambulatory care settings. In this paper we describe how insights from a workflow analysis conducted by NIST were implemented in a software prototype developed for a Veteran’s Health Administration (VHA) VAi2 innovation project and associated lessons learned. Methods: We organize the original recommendations and associated stages and steps visualized in process maps from NIST and the VA’s lessons learned from implementing the recommendations in the VAi2 prototype according to four stages: 1) before the patient visit, 2) during the visit, 3) discharge, and 4) visit documentation. NIST recommendations to improve workflow in ambulatory care (outpatient) settings and process map representations were based on reflective statements collected during one-hour discussions with three physicians. The development of the VAi2 prototype was conducted initially independently from the NIST recommendations, but at a midpoint in the process development, all of the implementation elements were compared with the NIST recommendations and lessons learned were documented. Findings: Story-based displays and templates with default preliminary order sets were used to support scheduling, time-critical notifications, drafting medication orders, and supporting a diagnosis-based workflow. These templates enabled customization to the level of diagnostic uncertainty. Functionality was designed to support cooperative work across interdisciplinary team members, including shared documentation sessions with tracking of text modifications, medication lists, and patient education features. Displays were customized to the role and included access for consultants and site-defined educator teams. Discussion: Workflow, usability, and patient safety can be enhanced through clinician-centered design of electronic health records. The lessons learned from implementing NIST recommendations to improve workflow in ambulatory care using an EHR provide a first step in moving from a billing-centered perspective on how to maintain accurate, comprehensive, and up-to-date information about a group of patients to a clinician-centered perspective. These recommendations point the way towards a “patient visit management system,” which incorporates broader notions of supporting workload management, supporting flexible flow of patients and tasks, enabling accountable distributed work across members of the clinical team, and supporting dynamic tracking of steps in tasks that have longer time distributions. PMID:26290887
REMORA: a pilot in the ocean of BioMoby web-services.
Carrere, Sébastien; Gouzy, Jérôme
2006-04-01
Emerging web-services technology allows interoperability between multiple distributed architectures. Here, we present REMORA, a web server implemented according to the BioMoby web-service specifications, providing life science researchers with an easy-to-use workflow generator and launcher, a repository of predefined workflows and a survey system. Jerome.Gouzy@toulouse.inra.fr The REMORA web server is freely available at http://bioinfo.genopole-toulouse.prd.fr/remora, sources are available upon request from the authors.
Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.
Almugbel, Reem; Hung, Ling-Hong; Hu, Jiaming; Almutairy, Abeer; Ortogero, Nicole; Tamta, Yashaswi; Yeung, Ka Yee
2018-01-01
Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.
Cole, Brian S; Moore, Jason H
2018-03-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Moore, Jason H.
2018-01-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
SISYPHUS: A high performance seismic inversion factory
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas
2016-04-01
In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
Singh, Dadabhai T; Trehan, Rahul; Schmidt, Bertil; Bretschneider, Timo
2008-01-01
Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed--a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area. In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution. The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.
A Foundation for Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Roth, Christopher J; Lannum, Louis M; Persons, Kenneth R
2016-10-01
Care providers today routinely obtain valuable clinical multimedia with mobile devices, scope cameras, ultrasound, and many other modalities at the point of care. Image capture and storage workflows may be heterogeneous across an enterprise, and as a result, they often are not well incorporated in the electronic health record. Enterprise Imaging refers to a set of strategies, initiatives, and workflows implemented across a healthcare enterprise to consistently and optimally capture, index, manage, store, distribute, view, exchange, and analyze all clinical imaging and multimedia content to enhance the electronic health record. This paper is intended to introduce Enterprise Imaging as an important initiative to clinical and informatics leadership, and outline its key elements of governance, strategy, infrastructure, common multimedia content, acquisition workflows, enterprise image viewers, and image exchange services.
Katherine Philpott, M; Stanciu, Cristina E; Kwon, Ye Jin; Bustamante, Eduardo E; Greenspoon, Susan A; Ehrhardt, Christopher J
2017-07-01
The goal of this study was to survey optical and biochemical variation in cell populations deposited onto a surface through touch or contact and identify specific features that may be used to distinguish and then sort cell populations from separate contributors in a trace biological mixture. Although we were not able to detect meaningful biochemical variation in touch samples deposited by different contributors through preliminary antibody surveys, we did observe distinct differences in red autofluorescence emissions (650-670 nm), with as much as a tenfold difference in mean fluorescence intensities observed between certain pairs of donors. Results indicate that the level of red autofluorescence in touch samples can be influenced by a donor's contact with specific material prior to handling the substrate from which cells were collected. In particular, we observed increased red autofluorescence in cells deposited subsequent to handling laboratory gloves, plant material, and certain types of marker ink, which could be easily visualized microscopically or using flow cytometry, and persisted after hand washing. To test whether these observed optical differences could potentially be used as the basis for a cell separation workflow, a controlled two-person touch mixture was separated into two fractions via fluorescence-activated cell sorting (FACS) using gating criteria based on intensity of 650-670 nm emissions and then subjected to DNA analysis. Genetic analysis of the sorted fractions provided partial DNA profiles that were consistent with separation of individual contributors from the mixture suggesting that variation in autofluorescence signatures, even if driven by extrinsic factors, may nonetheless be a useful means of isolating contributors to some touch mixtures. Graphical Abstract Conceptual workflow diagram. Trace biological mixtures containing cells from multiple individuals are analyzed by flow cytometry. Cells are then physically separated into two populations based on intensity of red autofluorescence using Fluorescence Activated Cell Sorting. Each isolated cell fraction is subjected to DNA analysis resulting in a DNA profile for each contributor.
NASA Astrophysics Data System (ADS)
Laban, Shaban; El-Desouky, Aly
2014-05-01
To achieve a rapid, simple and reliable parallel processing of different types of tasks and big data processing on any compute cluster, a lightweight messaging-based distributed applications processing and workflow execution framework model is proposed. The framework is based on Apache ActiveMQ and Simple (or Streaming) Text Oriented Message Protocol (STOMP). ActiveMQ , a popular and powerful open source persistence messaging and integration patterns server with scheduler capabilities, acts as a message broker in the framework. STOMP provides an interoperable wire format that allows framework programs to talk and interact between each other and ActiveMQ easily. In order to efficiently use the message broker a unified message and topic naming pattern is utilized to achieve the required operation. Only three Python programs and simple library, used to unify and simplify the implementation of activeMQ and STOMP protocol, are needed to use the framework. A watchdog program is used to monitor, remove, add, start and stop any machine and/or its different tasks when necessary. For every machine a dedicated one and only one zoo keeper program is used to start different functions or tasks, stompShell program, needed for executing the user required workflow. The stompShell instances are used to execute any workflow jobs based on received message. A well-defined, simple and flexible message structure, based on JavaScript Object Notation (JSON), is used to build any complex workflow systems. Also, JSON format is used in configuration, communication between machines and programs. The framework is platform independent. Although, the framework is built using Python the actual workflow programs or jobs can be implemented by any programming language. The generic framework can be used in small national data centres for processing seismological and radionuclide data received from the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). Also, it is possible to extend the use of the framework in monitoring the IDC pipeline. The detailed design, implementation,conclusion and future work of the proposed framework will be presented.
Mission Assurance in a Distributed Environment
2009-06-01
Notation ( BPMN ) – Graphical representation of business processes in a workflow • Unified Modeling Language (UML) – Use standard UML diagrams to model the system – Component, sequence, activity diagrams
Developing science gateways for drug discovery in a grid environment.
Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra
2016-01-01
Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.
Data distribution method of workflow in the cloud environment
NASA Astrophysics Data System (ADS)
Wang, Yong; Wu, Junjuan; Wang, Ying
2017-08-01
Cloud computing for workflow applications provides the required high efficiency calculation and large storage capacity and it also brings challenges to the protection of trade secrets and other privacy data. Because of privacy data will cause the increase of the data transmission time, this paper presents a new data allocation algorithm based on data collaborative damage degree, to improve the existing data allocation strategy? Safety and public cloud computer algorithm depends on the private cloud; the static allocation method in the initial stage only to the non-confidential data division to improve the original data, in the operational phase will continue to generate data to dynamically adjust the data distribution scheme. The experimental results show that the improved method is effective in reducing the data transmission time.
Intelligent services for discovery of complex geospatial features from remote sensing imagery
NASA Astrophysics Data System (ADS)
Yue, Peng; Di, Liping; Wei, Yaxing; Han, Weiguo
2013-09-01
Remote sensing imagery has been commonly used by intelligence analysts to discover geospatial features, including complex ones. The overwhelming volume of routine image acquisition requires automated methods or systems for feature discovery instead of manual image interpretation. The methods of extraction of elementary ground features such as buildings and roads from remote sensing imagery have been studied extensively. The discovery of complex geospatial features, however, is still rather understudied. A complex feature, such as a Weapon of Mass Destruction (WMD) proliferation facility, is spatially composed of elementary features (e.g., buildings for hosting fuel concentration machines, cooling towers, transportation roads, and fences). Such spatial semantics, together with thematic semantics of feature types, can be used to discover complex geospatial features. This paper proposes a workflow-based approach for discovery of complex geospatial features that uses geospatial semantics and services. The elementary features extracted from imagery are archived in distributed Web Feature Services (WFSs) and discoverable from a catalogue service. Using spatial semantics among elementary features and thematic semantics among feature types, workflow-based service chains can be constructed to locate semantically-related complex features in imagery. The workflows are reusable and can provide on-demand discovery of complex features in a distributed environment.
Realizing the Living Paper using the ProvONE Model for Reproducible Research
NASA Astrophysics Data System (ADS)
Jones, M. B.; Jones, C. S.; Ludäscher, B.; Missier, P.; Walker, L.; Slaughter, P.; Schildhauer, M.; Cuevas-Vicenttín, V.
2015-12-01
Science has advanced through traditional publications that codify research results as a permenant part of the scientific record. But because publications are static and atomic, researchers can only cite and reference a whole work when building on prior work of colleagues. The open source software model has demonstrated a new approach in which strong version control in an open environment can nurture an open ecosystem of software. Developers now commonly fork and extend software giving proper credit, with less repetition, and with confidence in the relationship to original software. Through initiatives like 'Beyond the PDF', an analogous model has been imagined for open science, in which software, data, analyses, and derived products become first class objects within a publishing ecosystem that has evolved to be finer-grained and is realized through a web of linked open data. We have prototyped a Living Paper concept by developing the ProvONE provenance model for scientific workflows, with prototype deployments in DataONE. ProvONE promotes transparency and openness by describing the authenticity, origin, structure, and processing history of research artifacts and by detailing the steps in computational workflows that produce derived products. To realize the Living Paper, we decompose scientific papers into their constituent products and publish these as compound objects in the DataONE federation of archival repositories. Each individual finding and sub-product of a reseach project (such as a derived data table, a workflow or script, a figure, an image, or a finding) can be independently stored, versioned, and cited. ProvONE provenance traces link these fine-grained products within and across versions of a paper, and across related papers that extend an original analysis. This allows for open scientific publishing in which researchers extend and modify findings, creating a dynamic, evolving web of results that collectively represent the scientific enterprise. The Living Paper provides detailed metadata for properly interpreting and verifying individual research findings, for tracing the origin of ideas, for launching new lines of inquiry, and for implementing transitive credit for research and engineering.
GUEST EDITOR'S INTRODUCTION: Guest Editor's introduction
NASA Astrophysics Data System (ADS)
Chrysanthis, Panos K.
1996-12-01
Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260, USA This special issue focuses on current efforts to represent and support workflows that integrate information systems and human resources within a business or manufacturing enterprise. Workflows may also be viewed as an emerging computational paradigm for effective structuring of cooperative applications involving human users and access to diverse data types not necessarily maintained by traditional database management systems. A workflow is an automated organizational process (also called business process) which consists of a set of activities or tasks that need to be executed in a particular controlled order over a combination of heterogeneous database systems and legacy systems. Within workflows, tasks are performed cooperatively by either human or computational agents in accordance with their roles in the organizational hierarchy. The challenge in facilitating the implementation of workflows lies in developing efficient workflow management systems. A workflow management system (also called workflow server, workflow engine or workflow enactment system) provides the necessary interfaces for coordination and communication among human and computational agents to execute the tasks involved in a workflow and controls the execution orderings of tasks as well as the flow of data that these tasks manipulate. That is, the workflow management system is responsible for correctly and reliably supporting the specification, execution, and monitoring of workflows. The six papers selected (out of the twenty-seven submitted for this special issue of Distributed Systems Engineering) address different aspects of these three functional components of a workflow management system. In the first paper, `Correctness issues in workflow management', Kamath and Ramamritham discuss the important issue of correctness in workflow management that constitutes a prerequisite for the use of workflows in the automation of the critical organizational/business processes. In particular, this paper examines the issues of execution atomicity and failure atomicity, differentiating between correctness requirements of system failures and logical failures, and surveys techniques that can be used to ensure data consistency in workflow management systems. While the first paper is concerned with correctness assuming transactional workflows in which selective transactional properties are associated with individual tasks or the entire workflow, the second paper, `Scheduling workflows by enforcing intertask dependencies' by Attie et al, assumes that the tasks can be either transactions or other activities involving legacy systems. This second paper describes the modelling and specification of conditions involving events and dependencies among tasks within a workflow using temporal logic and finite state automata. It also presents a scheduling algorithm that enforces all stated dependencies by executing at any given time only those events that are allowed by all the dependency automata and in an order as specified by the dependencies. In any system with decentralized control, there is a need to effectively cope with the tension that exists between autonomy and consistency requirements. In `A three-level atomicity model for decentralized workflow management systems', Ben-Shaul and Heineman focus on the specific requirement of enforcing failure atomicity in decentralized, autonomous and interacting workflow management systems. Their paper describes a model in which each workflow manager must be able to specify the sequence of tasks that comprise an atomic unit for the purposes of correctness, and the degrees of local and global atomicity for the purpose of cooperation with other workflow managers. The paper also discusses a realization of this model in which treaties and summits provide an agreement mechanism, while underlying transaction managers are responsible for maintaining failure atomicity. The fourth and fifth papers are experience papers describing a workflow management system and a large scale workflow application, respectively. Schill and Mittasch, in `Workflow management systems on top of OSF DCE and OMG CORBA', describe a decentralized workflow management system and discuss its implementation using two standardized middleware platforms, namely, OSF DCE and OMG CORBA. The system supports a new approach to workflow management, introducing several new concepts such as data type management for integrating various types of data and quality of service for various services provided by servers. A problem common to both database applications and workflows is the handling of missing and incomplete information. This is particularly pervasive in an `electronic market' with a huge number of retail outlets producing and exchanging volumes of data, the application discussed in `Information flow in the DAMA project beyond database managers: information flow managers'. Motivated by the need for a method that allows a task to proceed in a timely manner if not all data produced by other tasks are available by its deadline, Russell et al propose an architectural framework and a language that can be used to detect, approximate and, later on, to adjust missing data if necessary. The final paper, `The evolution towards flexible workflow systems' by Nutt, is complementary to the other papers and is a survey of issues and of work related to both workflow and computer supported collaborative work (CSCW) areas. In particular, the paper provides a model and a categorization of the dimensions which workflow management and CSCW systems share. Besides summarizing the recent advancements towards efficient workflow management, the papers in this special issue suggest areas open to investigation and it is our hope that they will also provide the stimulus for further research and development in the area of workflow management systems.
A patient workflow management system built on guidelines.
Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.
1997-01-01
To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606
NASA Astrophysics Data System (ADS)
Lengyel, F.; Yang, P.; Rosenzweig, B.; Vorosmarty, C. J.
2012-12-01
The Northeast Regional Earth System Model (NE-RESM, NSF Award #1049181) integrates weather research and forecasting models, terrestrial and aquatic ecosystem models, a water balance/transport model, and mesoscale and energy systems input-out economic models developed by interdisciplinary research team from academia and government with expertise in physics, biogeochemistry, engineering, energy, economics, and policy. NE-RESM is intended to forecast the implications of planning decisions on the region's environment, ecosystem services, energy systems and economy through the 21st century. Integration of model components and the development of cyberinfrastructure for interacting with the system is facilitated with the integrated Rule Oriented Data System (iRODS), a distributed data grid that provides archival storage with metadata facilities and a rule-based workflow engine for automating and auditing scientific workflows.
Distributed Data Integration Infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Critchlow, T; Ludaescher, B; Vouk, M
The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, usemore » them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future.« less
The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change
NASA Astrophysics Data System (ADS)
Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.
2015-12-01
The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.
Nuclear microscopy in trace-element biology — from cellular studies to the clinic
NASA Astrophysics Data System (ADS)
Lindh, Ulf
1993-05-01
The concentration and distribution of trace and major elements in cells are of great interest in cell biology. PIXE can provide elemental concentrations in the bulk of cells or organelles as other bulk techniques such as atomic absorption spectrophotometry and nuclear activation analysis. Supplementary information, perhaps more exciting, on the intracellular distributions of trace elements can be provided using nuclear microscopy. Intracellular distributions of trace elements in normal and malignant cells are presented. The toxicity of mercury and cadmium can be prevented by supplementation of the essential trace element selenium. Some results from an experimental animal model are discussed. The intercellular distribution of major and trace elements in isolated blood cells, as revealed by nuclear microscopy, provides useful clinical information. Examples are given concerning inflammatory connective-tissue diseases and the chronic fatigue syndrome.
NASA Langley Atmospheric Science Data Center (ASDC) Experience with Aircraft Data
NASA Astrophysics Data System (ADS)
Perez, J.; Sorlie, S.; Parker, L.; Mason, K. L.; Rinsland, P.; Kusterer, J.
2011-12-01
Over the past decade the NASA Langley ASDC has archived and distributed a variety of aircraft mission data sets. These datasets posed unique challenges for archiving from the rigidity of the archiving system and formats to the lack of metadata. The ASDC developed a state-of-the-art data archive and distribution system to serve the atmospheric sciences data provider and researcher communities. The system, called Archive - Next Generation (ANGe), is designed with a distributed, multi-tier, serviced-based, message oriented architecture enabling new methods for searching, accessing, and customizing data. The ANGe system provides the ease and flexibility to ingest and archive aircraft data through an ad hoc workflow or to develop a new workflow to suit the providers needs. The ASDC will describe the challenges encountered in preparing aircraft data for archiving and distribution. The ASDC is currently providing guidance to the DISCOVER-AQ (Deriving Information on Surface Conditions from Column and Vertically Resolved Observations Relevant to Air Quality) Earth Venture-1 project on developing collection, granule, and browse metadata as well as supporting the ADAM (Airborne Data For Assessing Models) site.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
SMITH: a LIMS for handling next-generation sequencing workflows
2014-01-01
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
SMITH: a LIMS for handling next-generation sequencing workflows.
Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko
2014-01-01
Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
Performance Studies on Distributed Virtual Screening
Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.
2014-01-01
Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Overcoming the Challenges of Implementing a Multi-Mission Distributed Workflow System
NASA Technical Reports Server (NTRS)
Sayfi, Elias; Cheng, Cecilia; Lee, Hyun; Patel, Rajesh; Takagi, Atsuya; Yu, Dan
2009-01-01
A multi-mission approach to solving the same problems for various projects is enticing. However, the multi-mission approach leads to the need to develop a configurable, adaptable and distributed system to meet unique project requirements. That, in turn, leads to a set of challenges varying from handling synchronization issues to coming up with a smart design that allows the "unknowns" to be decided later. This paper discusses the challenges that the Multi-mission Automated Task Invocation Subsystem (MATIS) team has come up against while designing the distributed workflow system, as well as elaborates on the solutions that were implemented. The first is to design an easily adaptable system that requires no code changes as a result of configuration changes. The number of formal deliveries is often limited because each delivery costs time and money. Changes such as the sequence of programs being called, a change of a parameter value in the program that is being automated should not result in code changes or redelivery.
Fuhrer, Gregory J.; Cain, Daniel J.; McKenzie, Stuart W.; Rinella, Joseph F.; Crawford, J. Kent; Skach, Kenneth A.; Hornberger, Michelle I.; Gannett, Marshall W.
1999-01-01
The report describes the distribution of trace elements in sediment, water, and aquatic biota in the Yakima River basin, Washington. Trace elements were determined from streambed sediment, suspended sediment, filtered and unfiltered water samples, aquatic insects, clams, fish livers, and fish fillets between 1987 and 1991. The distribution of trace elements in these media was related to local geology and anthropogenic sources. Additionally, annual and instantaneous loads were estimated for trace elements associated with suspended sediment and trace elements in filtered water samples. Trace elements also were screened against U.S. Environmental Protection Agency guidelines established for the protection of human health and aquatic life.
NASA Astrophysics Data System (ADS)
Gustafsson, C.; Nordström, F.; Persson, E.; Brynolfsson, J.; Olsson, L. E.
2017-04-01
Dosimetric errors in a magnetic resonance imaging (MRI) only radiotherapy workflow may be caused by system specific geometric distortion from MRI. The aim of this study was to evaluate the impact on planned dose distribution and delineated structures for prostate patients, originating from this distortion. A method was developed, in which computer tomography (CT) images were distorted using the MRI distortion field. The displacement map for an optimized MRI treatment planning sequence was measured using a dedicated phantom in a 3 T MRI system. To simulate the distortion aspects of a synthetic CT (electron density derived from MR images), the displacement map was applied to CT images, referred to as distorted CT images. A volumetric modulated arc prostate treatment plan was applied to the original CT and the distorted CT, creating a reference and a distorted CT dose distribution. By applying the inverse of the displacement map to the distorted CT dose distribution, a dose distribution in the same geometry as the original CT images was created. For 10 prostate cancer patients, the dose difference between the reference dose distribution and inverse distorted CT dose distribution was analyzed in isodose level bins. The mean magnitude of the geometric distortion was 1.97 mm for the radial distance of 200-250 mm from isocenter. The mean percentage dose differences for all isodose level bins, were ⩽0.02% and the radiotherapy structure mean volume deviations were <0.2%. The method developed can quantify the dosimetric effects of MRI system specific distortion in a prostate MRI only radiotherapy workflow, separated from dosimetric effects originating from synthetic CT generation. No clinically relevant dose difference or structure deformation was found when 3D distortion correction and high acquisition bandwidth was used. The method could be used for any MRI sequence together with any anatomy of interest.
Gustafsson, C; Nordström, F; Persson, E; Brynolfsson, J; Olsson, L E
2017-04-21
Dosimetric errors in a magnetic resonance imaging (MRI) only radiotherapy workflow may be caused by system specific geometric distortion from MRI. The aim of this study was to evaluate the impact on planned dose distribution and delineated structures for prostate patients, originating from this distortion. A method was developed, in which computer tomography (CT) images were distorted using the MRI distortion field. The displacement map for an optimized MRI treatment planning sequence was measured using a dedicated phantom in a 3 T MRI system. To simulate the distortion aspects of a synthetic CT (electron density derived from MR images), the displacement map was applied to CT images, referred to as distorted CT images. A volumetric modulated arc prostate treatment plan was applied to the original CT and the distorted CT, creating a reference and a distorted CT dose distribution. By applying the inverse of the displacement map to the distorted CT dose distribution, a dose distribution in the same geometry as the original CT images was created. For 10 prostate cancer patients, the dose difference between the reference dose distribution and inverse distorted CT dose distribution was analyzed in isodose level bins. The mean magnitude of the geometric distortion was 1.97 mm for the radial distance of 200-250 mm from isocenter. The mean percentage dose differences for all isodose level bins, were ⩽0.02% and the radiotherapy structure mean volume deviations were <0.2%. The method developed can quantify the dosimetric effects of MRI system specific distortion in a prostate MRI only radiotherapy workflow, separated from dosimetric effects originating from synthetic CT generation. No clinically relevant dose difference or structure deformation was found when 3D distortion correction and high acquisition bandwidth was used. The method could be used for any MRI sequence together with any anatomy of interest.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.
Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.
Aranguren, Mikel Egaña; Wilkinson, Mark D
2015-01-01
Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services. This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration. The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
Riposan, Adina; Taylor, Ian; Owens, David R; Rana, Omer; Conley, Edward C
2007-01-01
In this paper we present mechanisms for imaging and spectral data discovery, as applied to the early detection of pathologic mechanisms underlying diabetic retinopathy in research and clinical trial scenarios. We discuss the Alchemist framework, built using a generic peer-to-peer architecture, supporting distributed database queries and complex search algorithms based on workflow. The Alchemist is a domain-independent search mechanism that can be applied to search and data discovery scenarios in many areas. We illustrate Alchemist's ability to perform complex searches composed as a collection of peer-to-peer overlays, Grid-based services and workflows, e.g. applied to image and spectral data discovery, as applied to the early detection and prevention of retinal disease and investigational drug discovery. The Alchemist framework is built on top of decentralised technologies and uses industry standards such as Web services and SOAP for messaging.
NASA Astrophysics Data System (ADS)
Delventhal, D.; Schultz, D.; Diaz Velez, J. C.
2017-10-01
IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and data driven analysis. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd has recently been rewritten to increase its scaling capabilities, handle user analysis workflows together with simulation production, and facilitate the integration with 3rd party scheduling tools. IceProd 2, the second generation of IceProd, has been running in production for several months now. We share our experience setting up the system and things we’ve learned along the way.
VisTrails SAHM: visualization and workflow management for species habitat modeling
Morisette, Jeffrey T.; Jarnevich, Catherine S.; Holcombe, Tracy R.; Talbert, Colin B.; Ignizio, Drew A.; Talbert, Marian; Silva, Claudio; Koop, David; Swanson, Alan; Young, Nicholas E.
2013-01-01
The Software for Assisted Habitat Modeling (SAHM) has been created to both expedite habitat modeling and help maintain a record of the various input data, pre- and post-processing steps and modeling options incorporated in the construction of a species distribution model through the established workflow management and visualization VisTrails software. This paper provides an overview of the VisTrails:SAHM software including a link to the open source code, a table detailing the current SAHM modules, and a simple example modeling an invasive weed species in Rocky Mountain National Park, USA.
Towards PCC for Concurrent and Distributed Systems (Work in Progress)
NASA Technical Reports Server (NTRS)
Henriksen, Anders S.; Filinski, Andrzej
2009-01-01
We outline some conceptual challenges in extending the PCC paradigm to a concurrent and distributed setting, and sketch a generalized notion of module correctness based on viewing communication contracts as economic games. The model supports compositional reasoning about modular systems and is meant to apply not only to certification of executable code, but also of organizational workflows.
Thermal Remote Sensing with Uav-Based Workflows
NASA Astrophysics Data System (ADS)
Boesch, R.
2017-08-01
Climate change will have a significant influence on vegetation health and growth. Predictions of higher mean summer temperatures and prolonged summer draughts may pose a threat to agriculture areas and forest canopies. Rising canopy temperatures can be an indicator of plant stress because of the closure of stomata and a decrease in the transpiration rate. Thermal cameras are available for decades, but still often used for single image analysis, only in oblique view manner or with visual evaluations of video sequences. Therefore remote sensing using a thermal camera can be an important data source to understand transpiration processes. Photogrammetric workflows allow to process thermal images similar to RGB data. But low spatial resolution of thermal cameras, significant optical distortion and typically low contrast require an adapted workflow. Temperature distribution in forest canopies is typically completely unknown and less distinct than for urban or industrial areas, where metal constructions and surfaces yield high contrast and sharp edge information. The aim of this paper is to investigate the influence of interior camera orientation, tie point matching and ground control points on the resulting accuracy of bundle adjustment and dense cloud generation with a typically used photogrammetric workflow for UAVbased thermal imagery in natural environments.
An automated workflow for parallel processing of large multiview SPIM recordings
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-01-01
Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585
An automated workflow for parallel processing of large multiview SPIM recordings.
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-04-01
Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction : schmied@mpi-cbg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Automatic Image Processing Workflow for the Keck/NIRC2 Vortex Coronagraph
NASA Astrophysics Data System (ADS)
Xuan, Wenhao; Cook, Therese; Ngo, Henry; Zawol, Zoe; Ruane, Garreth; Mawet, Dimitri
2018-01-01
The Keck/NIRC2 camera, equipped with the vortex coronagraph, is an instrument targeted at the high contrast imaging of extrasolar planets. To uncover a faint planet signal from the overwhelming starlight, we utilize the Vortex Image Processing (VIP) library, which carries out principal component analysis to model and remove the stellar point spread function. To bridge the gap between data acquisition and data reduction, we implement a workflow that 1) downloads, sorts, and processes data with VIP, 2) stores the analysis products into a database, and 3) displays the reduced images, contrast curves, and auxiliary information on a web interface. Both angular differential imaging and reference star differential imaging are implemented in the analysis module. A real-time version of the workflow runs during observations, allowing observers to make educated decisions about time distribution on different targets, hence optimizing science yield. The post-night version performs a standardized reduction after the observation, building up a valuable database that not only helps uncover new discoveries, but also enables a statistical study of the instrument itself. We present the workflow, and an examination of the contrast performance of the NIRC2 vortex with respect to factors including target star properties and observing conditions.
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling
NASA Astrophysics Data System (ADS)
Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.
2015-12-01
The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M
2008-08-08
Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
A Drupal-Based Collaborative Framework for Science Workflows
NASA Astrophysics Data System (ADS)
Pinheiro da Silva, P.; Gandara, A.
2010-12-01
Cyber-infrastructure is built from utilizing technical infrastructure to support organizational practices and social norms to provide support for scientific teams working together or dependent on each other to conduct scientific research. Such cyber-infrastructure enables the sharing of information and data so that scientists can leverage knowledge and expertise through automation. Scientific workflow systems have been used to build automated scientific systems used by scientists to conduct scientific research and, as a result, create artifacts in support of scientific discoveries. These complex systems are often developed by teams of scientists who are located in different places, e.g., scientists working in distinct buildings, and sometimes in different time zones, e.g., scientist working in distinct national laboratories. The sharing of these specifications is currently supported by the use of version control systems such as CVS or Subversion. Discussions about the design, improvement, and testing of these specifications, however, often happen elsewhere, e.g., through the exchange of email messages and IM chatting. Carrying on a discussion about these specifications is challenging because comments and specifications are not necessarily connected. For instance, the person reading a comment about a given workflow specification may not be able to see the workflow and even if the person can see the workflow, the person may not specifically know to which part of the workflow a given comments applies to. In this paper, we discuss the design, implementation and use of CI-Server, a Drupal-based infrastructure, to support the collaboration of both local and distributed teams of scientists using scientific workflows. CI-Server has three primary goals: to enable information sharing by providing tools that scientists can use within their scientific research to process data, publish and share artifacts; to build community by providing tools that support discussions between scientists about artifacts used or created through scientific processes; and to leverage the knowledge collected within the artifacts and scientific collaborations to support scientific discoveries.
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets
Jeong, Won-Ki; Beyer, Johanna; Hadwiger, Markus; Vazquez, Amelio; Pfister, Hanspeter; Whitaker, Ross T.
2011-01-01
Recent advances in scanning technology provide high resolution EM (Electron Microscopy) datasets that allow neuroscientists to reconstruct complex neural connections in a nervous system. However, due to the enormous size and complexity of the resulting data, segmentation and visualization of neural processes in EM data is usually a difficult and very time-consuming task. In this paper, we present NeuroTrace, a novel EM volume segmentation and visualization system that consists of two parts: a semi-automatic multiphase level set segmentation with 3D tracking for reconstruction of neural processes, and a specialized volume rendering approach for visualization of EM volumes. It employs view-dependent on-demand filtering and evaluation of a local histogram edge metric, as well as on-the-fly interpolation and ray-casting of implicit surfaces for segmented neural structures. Both methods are implemented on the GPU for interactive performance. NeuroTrace is designed to be scalable to large datasets and data-parallel hardware architectures. A comparison of NeuroTrace with a commonly used manual EM segmentation tool shows that our interactive workflow is faster and easier to use for the reconstruction of complex neural processes. PMID:19834227
Evaluation of Standardization of Transfer of Accountability between Inpatient Pharmacists.
Tsoi, Vivian; Dewhurst, Norman; Tom, Elaine
2018-01-01
A compelling body of evidence supports the notion that transfer of accountability (TOA) improves communication, continuity of care, and patient safety. TOA involves the transmission and receipt of information between clinicians at each transition of care. Without a notification system alerting pharmacists to patient transfers, pharmacists' ability to seek out and complete TOA may be hindered. A standardized policy and process for TOA, with automated workflow, was implemented at the study hospital in 2015, to ensure consistency and timeliness of documentation by pharmacists. To evaluate pharmacists' adherence to and satisfaction with the TOA policy and process. A retrospective audit was conducted, using a random sample of individuals who were inpatients between June 2014 and February 2016. Transition points for TOA were identified, and the computerized pharmacy system was reviewed to determine whether TOA had been documented at each transition point. After the audit, an online survey was distributed to assess pharmacists' response to and satisfaction with the TOA policy and workflow. Before the TOA workflow was implemented, TOA documentation by pharmacists ranged from 11% (10/93) to 43% (48/111) of transitions. Eight months after implementation of the workflow, the rate of TOA documentation was 87% (68/78), exceeding the institution's target of 70%. Of the 32 pharmacists surveyed, most were satisfied with the TOA policy and agreed that the standardized workflow was simple to use, increased the number of TOAs provided and received, and improved the quality of completed TOAs. Respondents also indicated that the TOA workflow had improved patient care (mean score 4.09/5, standard deviation 0.64). The standardized TOA policy and process were well received by pharmacists, and resulted in consistent TOA documentation and a TOA documentation rate that exceeded the institutional target.
Whole genome sequencing: an efficient approach to ensuring food safety
NASA Astrophysics Data System (ADS)
Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.
2017-09-01
Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.
Provenance for Runtime Workflow Steering and Validation in Computational Seismology
NASA Astrophysics Data System (ADS)
Spinuso, A.; Krischer, L.; Krause, A.; Filgueira, R.; Magnoni, F.; Muraleedharan, V.; David, M.
2014-12-01
Provenance systems may be offered by modern workflow engines to collect metadata about the data transformations at runtime. If combined with effective visualisation and monitoring interfaces, these provenance recordings can speed up the validation process of an experiment, suggesting interactive or automated interventions with immediate effects on the lifecycle of a workflow run. For instance, in the field of computational seismology, if we consider research applications performing long lasting cross correlation analysis and high resolution simulations, the immediate notification of logical errors and the rapid access to intermediate results, can produce reactions which foster a more efficient progress of the research. These applications are often executed in secured and sophisticated HPC and HTC infrastructures, highlighting the need for a comprehensive framework that facilitates the extraction of fine grained provenance and the development of provenance aware components, leveraging the scalability characteristics of the adopted workflow engines, whose enactment can be mapped to different technologies (MPI, Storm clusters, etc). This work looks at the adoption of W3C-PROV concepts and data model within a user driven processing and validation framework for seismic data, supporting also computational and data management steering. Validation needs to balance automation with user intervention, considering the scientist as part of the archiving process. Therefore, the provenance data is enriched with community-specific metadata vocabularies and control messages, making an experiment reproducible and its description consistent with the community understandings. Moreover, it can contain user defined terms and annotations. The current implementation of the system is supported by the EU-Funded VERCE (http://verce.eu). It provides, as well as the provenance generation mechanisms, a prototypal browser-based user interface and a web API built on top of a NoSQL storage technology, experimenting ways to ensure a rapid and flexible access to the lineage traces. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data which may be selectively stored at runtime, into dedicated data archives.
Gough, Albert; Shun, Tongying; Taylor, D. Lansing; Schurdak, Mark
2016-01-01
Heterogeneity is well recognized as a common property of cellular systems that impacts biomedical research and the development of therapeutics and diagnostics. Several studies have shown that analysis of heterogeneity: gives insight into mechanisms of action of perturbagens; can be used to predict optimal combination therapies; and to quantify heterogeneity in tumors where heterogeneity is believed to be associated with adaptation and resistance. Cytometry methods including high content screening (HCS), high throughput microscopy, flow cytometry, mass spec imaging and digital pathology capture cell level data for populations of cells. However it is often assumed that the population response is normally distributed and therefore that the average adequately describes the results. A deeper understanding of the results of the measurements and more effective comparison of perturbagen effects requires analysis that takes into account the distribution of the measurements, i.e. the heterogeneity. However, the reproducibility of heterogeneous data collected on different days, and in different plates/slides has not previously been evaluated. Here we show that conventional assay quality metrics alone are not adequate for quality control of the heterogeneity in the data. To address this need, we demonstrate the use of the Kolmogorov-Smirnov statistic as a metric for monitoring the reproducibility of heterogeneity in an SAR screen, describe a workflow for quality control in heterogeneity analysis. One major challenge in high throughput biology is the evaluation and interpretation of heterogeneity in thousands of samples, such as compounds in a cell-based screen. In this study we also demonstrate that three heterogeneity indices previously reported, capture the shapes of the distributions and provide a means to filter and browse big data sets of cellular distributions in order to compare and identify distributions of interest. These metrics and methods are presented as a workflow for analysis of heterogeneity in large scale biology projects. PMID:26476369
NASA Astrophysics Data System (ADS)
Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.
2007-12-01
Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
ERIC Educational Resources Information Center
Pardos, Zachary A.; Whyte, Anthony; Kao, Kevin
2016-01-01
In this paper, we address issues of transparency, modularity, and privacy with the introduction of an open source, web-based data repository and analysis tool tailored to the Massive Open Online Course community. The tool integrates data request/authorization and distribution workflow features as well as provides a simple analytics module upload…
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology
Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
A Scientific Data Provenance Harvester for Distributed Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stephan, Eric G.; Raju, Bibi; Elsethagen, Todd O.
Data provenance provides a way for scientists to observe how experimental data originates, conveys process history, and explains influential factors such as experimental rationale and associated environmental factors from system metrics measured at runtime. The US Department of Energy Office of Science Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project has developed a provenance harvester that is capable of collecting observations from file based evidence typically produced by distributed applications. To achieve this, file based evidence is extracted and transformed into an intermediate data format inspired in part by W3C CSV on the Web recommendations, calledmore » the Harvester Provenance Application Interface (HAPI) syntax. This syntax provides a general means to pre-stage provenance into messages that are both human readable and capable of being written to a provenance store, Provenance Environment (ProvEn). HAPI is being applied to harvest provenance from climate ensemble runs for Accelerated Climate Modeling for Energy (ACME) project funded under the U.S. Department of Energy’s Office of Biological and Environmental Research (BER) Earth System Modeling (ESM) program. ACME informally provides provenance in a native form through configuration files, directory structures, and log files that contain success/failure indicators, code traces, and performance measurements. Because of its generic format, HAPI is also being applied to harvest tabular job management provenance from Belle II DIRAC scheduler relational database tables as well as other scientific applications that log provenance related information.« less
Documet, Jorge; Liu, Brent J; Documet, Luis; Huang, H K
2006-07-01
This paper describes a picture archiving and communication system (PACS) tool based on Web technology that remotely manages medical images between a PACS archive and remote destinations. Successfully implemented in a clinical environment and also demonstrated for the past 3 years at the conferences of various organizations, including the Radiological Society of North America, this tool provides a very practical and simple way to manage a PACS, including off-site image distribution and disaster recovery. The application is robust and flexible and can be used on a standard PC workstation or a Tablet PC, but more important, it can be used with a personal digital assistant (PDA). With a PDA, the Web application becomes a powerful wireless and mobile image management tool. The application's quick and easy-to-use features allow users to perform Digital Imaging and Communications in Medicine (DICOM) queries and retrievals with a single interface, without having to worry about the underlying configuration of DICOM nodes. In addition, this frees up dedicated PACS workstations to perform their specialized roles within the PACS workflow. This tool has been used at Saint John's Health Center in Santa Monica, California, for 2 years. The average number of queries per month is 2,021, with 816 C-MOVE retrieve requests. Clinical staff members can use PDAs to manage image workflow and PACS examination distribution conveniently for off-site consultations by referring physicians and radiologists and for disaster recovery. This solution also improves radiologists' effectiveness and efficiency in health care delivery both within radiology departments and for off-site clinical coverage.
NASA Technical Reports Server (NTRS)
Podwysocki, M. H.
1974-01-01
Two study areas in a cratonic platform underlain by flat-lying sedimentary rocks were analyzed to determine if a quantitative relationship exists between fracture trace patterns and their frequency distributions and subsurface structural closures which might contain petroleum. Fracture trace lengths and frequency (number of fracture traces per unit area) were analyzed by trend surface analysis and length frequency distributions also were compared to a standard Gaussian distribution. Composite rose diagrams of fracture traces were analyzed using a multivariate analysis method which grouped or clustered the rose diagrams and their respective areas on the basis of the behavior of the rays of the rose diagram. Analysis indicates that the lengths of fracture traces are log-normally distributed according to the mapping technique used. Fracture trace frequency appeared higher on the flanks of active structures and lower around passive reef structures. Fracture trace log-mean lengths were shorter over several types of structures, perhaps due to increased fracturing and subsequent erosion. Analysis of rose diagrams using a multivariate technique indicated lithology as the primary control for the lower grouping levels. Groupings at higher levels indicated that areas overlying active structures may be isolated from their neighbors by this technique while passive structures showed no differences which could be isolated.
A SPECT system simulator built on the SolidWorks TM 3D-Design package.
Li, Xin; Furenlid, Lars R
2014-08-17
We have developed a GPU-accelerated SPECT system simulator that integrates into instrument-design workflow [1]. This simulator includes a gamma-ray tracing module that can rapidly propagate gamma-ray photons through arbitrary apertures modeled by SolidWorks TM -created stereolithography (.STL) representations with a full complement of physics cross sections [2, 3]. This software also contains a scintillation detector simulation module that can model a scintillation detector with arbitrary scintillation crystal shape and light-sensor arrangement. The gamma-ray tracing module enables us to efficiently model aperture and detector crystals in SolidWorks TM and save them as STL file format, then load the STL-format model into this module to generate list-mode results of interacted gamma-ray photon information (interaction positions and energies) inside the detector crystals. The Monte-Carlo scintillation detector simulation module enables us to simulate how scintillation photons get reflected, refracted and absorbed inside a scintillation detector, which contributes to more accurate simulation of a SPECT system.
Surinova, Silvia; Hüttenhain, Ruth; Chang, Ching-Yun; Espona, Lucia; Vitek, Olga; Aebersold, Ruedi
2013-08-01
Targeted proteomics based on selected reaction monitoring (SRM) mass spectrometry is commonly used for accurate and reproducible quantification of protein analytes in complex biological mixtures. Strictly hypothesis-driven, SRM assays quantify each targeted protein by collecting measurements on its peptide fragment ions, called transitions. To achieve sensitive and accurate quantitative results, experimental design and data analysis must consistently account for the variability of the quantified transitions. This consistency is especially important in large experiments, which increasingly require profiling up to hundreds of proteins over hundreds of samples. Here we describe a robust and automated workflow for the analysis of large quantitative SRM data sets that integrates data processing, statistical protein identification and quantification, and dissemination of the results. The integrated workflow combines three software tools: mProphet for peptide identification via probabilistic scoring; SRMstats for protein significance analysis with linear mixed-effect models; and PASSEL, a public repository for storage, retrieval and query of SRM data. The input requirements for the protocol are files with SRM traces in mzXML format, and a file with a list of transitions in a text tab-separated format. The protocol is especially suited for data with heavy isotope-labeled peptide internal standards. We demonstrate the protocol on a clinical data set in which the abundances of 35 biomarker candidates were profiled in 83 blood plasma samples of subjects with ovarian cancer or benign ovarian tumors. The time frame to realize the protocol is 1-2 weeks, depending on the number of replicates used in the experiment.
A Semi-Automated Workflow Solution for Data Set Publication
Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.; ...
2016-03-08
In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
CASAS: A tool for composing automatically and semantically astrophysical services
NASA Astrophysics Data System (ADS)
Louge, T.; Karray, M. H.; Archimède, B.; Knödlseder, J.
2017-07-01
Multiple astronomical datasets are available through internet and the astrophysical Distributed Computing Infrastructure (DCI) called Virtual Observatory (VO). Some scientific workflow technologies exist for retrieving and combining data from those sources. However selection of relevant services, automation of the workflows composition and the lack of user-friendly platforms remain a concern. This paper presents CASAS, a tool for semantic web services composition in astrophysics. This tool proposes automatic composition of astrophysical web services and brings a semantics-based, automatic composition of workflows. It widens the services choice and eases the use of heterogeneous services. Semantic web services composition relies on ontologies for elaborating the services composition; this work is based on Astrophysical Services ONtology (ASON). ASON had its structure mostly inherited from the VO services capacities. Nevertheless, our approach is not limited to the VO and brings VO plus non-VO services together without the need for premade recipes. CASAS is available for use through a simple web interface.
A Semi-Automated Workflow Solution for Data Set Publication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vannan, Suresh; Beaty, Tammy W.; Cook, Robert B.
In order to address the need for published data, considerable effort has gone into formalizing the process of data publication. From funding agencies to publishers, data publication has rapidly become a requirement. Digital Object Identifiers (DOI) and data citations have enhanced the integration and availability of data. The challenge facing data publishers now is to deal with the increased number of publishable data products and most importantly the difficulties of publishing diverse data products into an online archive. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-funded data center, faces these challenges as it deals withmore » data products created by individual investigators. This paper summarizes the challenges of curating data and provides a summary of a workflow solution that ORNL DAAC researcher and technical staffs have created to deal with publication of the diverse data products. Finally, the workflow solution presented here is generic and can be applied to data from any scientific domain and data located at any data center.« less
2011-02-01
Process Architecture Technology Analysis: Executive .............................................. 15 UIMA as Executive...44 A.4: Flow Code in UIMA ......................................................................................................... 46... UIMA ................................................................................................................................ 57 E.2
Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline
Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur
2010-01-01
Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408
A workflow to investigate exposure and pharmacokinetic ...
Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption, distribution, metabolism, excretion (ADME) properties of chemicals. We developed a conceptual workflow to consider exposure and ADME properties in relationship to an MIE and demonstrated the utility of this workflow using a previously established AOP, acetylcholinesterase (AChE) inhibition. Thirty active chemicals found to inhibit AChE in the ToxCastTM assay were examined with respect to their exposure and absorption potentials, and their ability to cross the blood-brain barrier. Structural similarities of active compounds were compared against structures of inactive compounds to detect possible non-active parents that might have active metabolites. Fifty-two of the 1,029 inactive compounds exhibited a similarity threshold above 75% with their nearest active neighbors. Excluding compounds that may not be absorbed, 22 could be potentially toxic following metabolism. The incorporation of exposure and ADME properties into the conceptual workflow resulted in prioritization of 20 out of 30 active compounds identified in an AChE inhibition assay for further analysis, along with identification of several inactive parent compounds of active metabolites. This qualitative approach can minimize co
The BioExtract Server: a web-based bioinformatic workflow platform
Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.
2011-01-01
The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.
Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor
2016-01-01
In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
Potential of knowledge discovery using workflows implemented in the C3Grid
NASA Astrophysics Data System (ADS)
Engel, Thomas; Fink, Andreas; Ulbrich, Uwe; Schartner, Thomas; Dobler, Andreas; Fritzsch, Bernadette; Hiller, Wolfgang; Bräuer, Benny
2013-04-01
With the increasing number of climate simulations, reanalyses and observations, new infrastructures to search and analyse distributed data are necessary. In recent years, the Grid architecture became an important technology to fulfill these demands. For the German project "Collaborative Climate Community Data and Processing Grid" (C3Grid) computer scientists and meteorologists developed a system that offers its users a webinterface to search and download climate data and use implemented analysis tools (called workflows) to further investigate them. In this contribution, two workflows that are implemented in the C3Grid architecture are presented: the Cyclone Tracking (CT) and Stormtrack workflow. They shall serve as an example on how to perform numerous investigations on midlatitude winterstorms on a large amount of analysis and climate model data without having an insight into the data source, program code and a low-to-moderate understanding of the theortical background. CT is based on the work of Murray and Simmonds (1991) to identify and track local minima in the mean sea level pressure (MSLP) field of the selected dataset. Adjustable thresholds for the curvature of the isobars as well as the minimum lifetime of a cyclone allow the distinction of weak subtropical heat low systems and stronger midlatitude cyclones e.g. in the Northern Atlantic. The user gets the resulting track data including statistics about the track density, average central pressure, average central curvature, cyclogenesis and cyclolysis as well as pre-built visualizations of these results. Stormtrack calculates the 2.5-6 day bandpassfiltered standard deviation of the geopotential height on a selected pressure level. Although this workflow needs much less computational effort compared to CT it shows structures that are in good agreement with the track density of the CT workflow. To what extent changes in the mid-level tropospheric storm track are reflected in trough density and intensity alteration of surface cyclones. A specific feature of C3Grid is the flexible Workflow Scheduling Service (WSS) which also allows for automated nightly analysis runs of CT, Stormtrack, etc. with different input parameter sets. The statistical results of these workflows can be accumulated afterwards by a scheduled final analysis step, thereby providing a tool for data intensive analytics for the massive amounts of climate model data accessible through C3Grid. First tests with these automated analysis workflows show promising results to speed up the investigation of high volume modeling data. This example is relevant to the thorough analysis of future changes in storminess in Europe and is just one example of the potential of knowledge discovery using automated workflows implemented in the C3Grid architecture.
NASA Astrophysics Data System (ADS)
Smith, J. P.; Muller, A. C.
2013-05-01
Predicting the fate and distribution of anthropogenic-sourced trace metals in riverine and estuarine systems is challenging due to multiple and varying source functions and dynamic physiochemical conditions. Between July 2011 and November 2012, sediment and water column samples were collected from over 20 sites in the tidal-fresh Potomac River estuary, Washington, DC near the outfall of the Blue Plains Advanced Wastewater Treatment Plant (BPWTP) for measurement of select trace metals. Field observations of water column parameters (conductivity, temperature, pH, turbidity) were also made at each sampling site. Trace metal concentrations were normalized to the "background" composition of the river determined from control sites in order to investigate the distribution BPWTP-sourced in local Potomac River receiving waters. Temporal differences in the observed distribution of trace metals were attributed to changes in the relative contribution of metals from different sources (wastewater, riverine, other) coupled with differences in the physiochemical conditions of the water column. Results show that normalizing near-source concentrations to the background composition of the water body and also to key environmental parameters can aid in predicting the fate and distribution of anthropogenic-sourced trace metals in dynamic riverine and estuarine systems like the tidal-fresh Potomac River.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Making Sense of Complexity with FRE, a Scientific Workflow System for Climate Modeling (Invited)
NASA Astrophysics Data System (ADS)
Langenhorst, A. R.; Balaji, V.; Yakovlev, A.
2010-12-01
A workflow is a description of a sequence of activities that is both precise and comprehensive. Capturing the workflow of climate experiments provides a record which can be queried or compared, and allows reproducibility of the experiments - sometimes even to the bit level of the model output. This reproducibility helps to verify the integrity of the output data, and enables easy perturbation experiments. GFDL's Flexible Modeling System Runtime Environment (FRE) is a production-level software project which defines and implements building blocks of the workflow as command line tools. The scientific, numerical and technical input needed to complete the workflow of an experiment is recorded in an experiment description file in XML format. Several key features add convenience and automation to the FRE workflow: ● Experiment inheritance makes it possible to define a new experiment with only a reference to the parent experiment and the parameters to override. ● Testing is a basic element of the FRE workflow: experiments define short test runs which are verified before the main experiment is run, and a set of standard experiments are verified with new code releases. ● FRE is flexible enough to support short runs with mere megabytes of data, to high-resolution experiments that run on thousands of processors for months, producing terabytes of output data. Experiments run in segments of model time; after each segment, the state is saved and the model can be checkpointed at that level. Segment length is defined by the user, but the number of segments per system job is calculated to fit optimally in the batch scheduler requirements. FRE provides job control across multiple segments, and tools to monitor and alter the state of long-running experiments. ● Experiments are entered into a Curator Database, which stores query-able metadata about the experiment and the experiment's output. ● FRE includes a set of standardized post-processing functions as well as the ability to incorporate user-level functions. FRE post-processing can take us all the way to the preparing of graphical output for a scientific audience, and publication of data on a public portal. ● Recent FRE development includes incorporating a distributed workflow to support remote computing.
Integrate Data into Scientific Workflows for Terrestrial Biosphere Model Evaluation through Brokers
NASA Astrophysics Data System (ADS)
Wei, Y.; Cook, R. B.; Du, F.; Dasgupta, A.; Poco, J.; Huntzinger, D. N.; Schwalm, C. R.; Boldrini, E.; Santoro, M.; Pearlman, J.; Pearlman, F.; Nativi, S.; Khalsa, S.
2013-12-01
Terrestrial biosphere models (TBMs) have become integral tools for extrapolating local observations and process-level understanding of land-atmosphere carbon exchange to larger regions. Model-model and model-observation intercomparisons are critical to understand the uncertainties within model outputs, to improve model skill, and to improve our understanding of land-atmosphere carbon exchange. The DataONE Exploration, Visualization, and Analysis (EVA) working group is evaluating TBMs using scientific workflows in UV-CDAT/VisTrails. This workflow-based approach promotes collaboration and improved tracking of evaluation provenance. But challenges still remain. The multi-scale and multi-discipline nature of TBMs makes it necessary to include diverse and distributed data resources in model evaluation. These include, among others, remote sensing data from NASA, flux tower observations from various organizations including DOE, and inventory data from US Forest Service. A key challenge is to make heterogeneous data from different organizations and disciplines discoverable and readily integrated for use in scientific workflows. This presentation introduces the brokering approach taken by the DataONE EVA to fill the gap between TBMs' evaluation scientific workflows and cross-organization and cross-discipline data resources. The DataONE EVA started the development of an Integrated Model Intercomparison Framework (IMIF) that leverages standards-based discovery and access brokers to dynamically discover, access, and transform (e.g. subset and resampling) diverse data products from DataONE, Earth System Grid (ESG), and other data repositories into a format that can be readily used by scientific workflows in UV-CDAT/VisTrails. The discovery and access brokers serve as an independent middleware that bridge existing data repositories and TBMs evaluation scientific workflows but introduce little overhead to either component. In the initial work, an OpenSearch-based discovery broker is leveraged to provide a consistent mechanism for data discovery. Standards-based data services, including Open Geospatial Consortium (OGC) Web Coverage Service (WCS) and THREDDS are leveraged to provide on-demand data access and transformations through the data access broker. To ease the adoption of broker services, a package of broker client VisTrails modules have been developed to be easily plugged into scientific workflows. The initial IMIF has been successfully tested in selected model evaluation scenarios involved in the NASA-funded Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP).
Workflow Challenges of Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Towbin, Alexander J; Roth, Christopher J; Bronkalla, Mark; Cram, Dawn
2016-10-01
With the advent of digital cameras, there has been an explosion in the number of medical specialties using images to diagnose or document disease and guide interventions. In many specialties, these images are not added to the patient's electronic medical record and are not distributed so that other providers caring for the patient can view them. As hospitals begin to develop enterprise imaging strategies, they have found that there are multiple challenges preventing the implementation of systems to manage image capture, image upload, and image management. This HIMSS-SIIM white paper will describe the key workflow challenges related to enterprise imaging and offer suggestions for potential solutions to these challenges.
SolTrace | Concentrating Solar Power | NREL
NREL packaged distribution or from source code at the SolTrace open source project website. NREL Publications Support FAQs SolTrace open source project The code uses Monte-Carlo ray-tracing methodology. The -tracing capabilities. With the release of the SolTrace open source project, the software has adopted
NASA Astrophysics Data System (ADS)
Orlov, M. Yu; Lukachev, S. V.; Anisimov, V. M.
2018-01-01
The position of combustion chamber between compressor and turbine and combined action of these elements imply that the working processes of all these elements are interconnected. One of the main requirements of the combustion chamber is the formation of the desirable temperature field at the turbine inlet, which can realize necessary durability of nozzle assembly and blade wheel of the first stage of high-pressure turbine. The method of integrated simulation of combustion chamber and neighboring nodes (compressor and turbine) was developed. On the first stage of the study, this method was used to investigate the influence of non-uniformity of flow distribution, occurred after compressor blades on combustion chamber workflow. The goal of the study is to assess the impact of non-uniformity of flow distribution after the compressor on the parameters before the turbine. The calculation was carried out in a transient case for some operation mode of the engine. The simulation showed that the inclusion of compressor has an effect on combustion chamber workflow and allows us to determine temperature field at the turbine inlet and assesses its durability more accurately. In addition, the simulation with turbine showed the changes in flow velocity distribution and pressure in combustion chamber.
Trace Elements in Ovaries: Measurement and Physiology.
Ceko, Melanie J; O'Leary, Sean; Harris, Hugh H; Hummitzsch, Katja; Rodgers, Raymond J
2016-04-01
Traditionally, research in the field of trace element biology and human and animal health has largely depended on epidemiological methods to demonstrate involvement in biological processes. These studies were typically followed by trace element supplementation trials or attempts at identification of the biochemical pathways involved. With the discovery of biological molecules that contain the trace elements, such as matrix metalloproteinases containing zinc (Zn), cytochrome P450 enzymes containing iron (Fe), and selenoproteins containing selenium (Se), much of the current research focuses on these molecules, and, hence, only indirectly on trace elements themselves. This review focuses largely on two synchrotron-based x-ray techniques: X-ray absorption spectroscopy and x-ray fluorescence imaging that can be used to identify the in situ speciation and distribution of trace elements in tissues, using our recent studies of bovine ovaries, where the distribution of Fe, Se, Zn, and bromine were determined. It also discusses the value of other techniques, such as inductively coupled plasma mass spectrometry, used to garner information about the concentrations and elemental state of the trace elements. These applications to measure trace elemental distributions in bovine ovaries at high resolutions provide new insights into possible roles for trace elements in the ovary. © 2016 by the Society for the Study of Reproduction, Inc.
Metzner, Ralf; Schneider, Heike Ursula; Breuer, Uwe; Thorpe, Michael Robert; Schurr, Ulrich; Schroeder, Walter Heinz
2010-01-01
Fluxes of mineral nutrients in the xylem are strongly influenced by interactions with the surrounding stem tissues and are probably regulated by them. Toward a mechanistic understanding of these interactions, we applied stable isotope tracers of magnesium, potassium, and calcium continuously to the transpiration stream of cut bean (Phaseolus vulgaris) shoots to study their radial exchange at the cell and tissue level with stem tissues between pith and phloem. For isotope localization, we combined sample preparation with secondary ion mass spectrometry in a completely cryogenic workflow. After 20 min of application, tracers were readily detectable to various degrees in all tissues. The xylem parenchyma near the vessels exchanged freely with the vessels, its nutrient elements reaching a steady state of strong exchange with elements in the vessels within 20 min, mainly via apoplastic pathways. A slow exchange between vessels and cambium and phloem suggested that they are separated from the xylem, parenchyma, and pith, possibly by an apoplastic barrier to diffusion for nutrients (as for carbohydrates). There was little difference in these distributions when tracers were applied directly to intact xylem via a microcapillary, suggesting that xylem tension had little effect on radial exchange of these nutrients and that their movement was mainly diffusive. PMID:19965970
Zhu, Zongmin; Xue, Junhui; Deng, Yuzhen; Chen, Lin; Liu, Jiangfeng
2016-04-15
Based on geochemical and magnetic approaches, the distribution, sources, and health risk of trace metals in surface sediments from a seashore tourist city were investigated. A significant correlation was found between magnetic susceptibility (χ) and trace metals, which suggested that levels of trace metals in the sediments can be effectively depicted by the magnetic approach. The spatial distribution of χ and trace metals matched well with the city layout with relatively higher values being found in the port and busy tourist areas. This result, together with enrichment factors (EFs) and Tomlinson pollution load index (PLI) of metals, suggested that the influence of human activities on the coastal environment was noticeable. Principal component analysis (PCA) indicated that trace metals in the sediments were derived from both anthropogenic and natural sources. Noncarcinogenic risk assessment showed that there was no potential health risk of exposure to metals by means of ingestion or inhalation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Are memory traces localized or distributed?
Thompson, R F
1991-01-01
Evidence supports the view that "memory traces" are formed in the hippocampus and in the cerebellum in classical conditioning of discrete behavioral responses (e.g. eyeblink conditioning). In the hippocampus, learning results in long-lasting increases in excitability of pyramidal neurons that appear to be localized to these neurons (i.e. changes in membrane properties and receptor function). However, these learning-altered pyramidal neurons are distributed widely throughout CA3 and CA1. Although it plays a key role in certain aspects of classical conditioning, the hippocampus is not necessary for learning and memory of the basic conditioned responses. The cerebellum and its associated brain stem circuitry, on the other hand, does appear to be essential (necessary and sufficient) for learning and memory of the conditioned response. Evidence to date is most consistent with a localized trace in the interpositus nucleus and multiple localized traces in cerebellar cortex, each involving relatively large ensembles of neurons. Perhaps "procedural" memory traces are relatively localized and "declarative" traces more widely distributed.
Agile parallel bioinformatics workflow management using Pwrake.
Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro
2011-09-08
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
Agile parallel bioinformatics workflow management using Pwrake
2011-01-01
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774
Esdar, Moritz; Hübner, Ursula; Liebe, Jan-David; Hüsers, Jens; Thye, Johannes
2017-01-01
Clinical information logistics is a construct that aims to describe and explain various phenomena of information provision to drive clinical processes. It can be measured by the workflow composite score, an aggregated indicator of the degree of IT support in clinical processes. This study primarily aimed to investigate the yet unknown empirical patterns constituting this construct. The second goal was to derive a data-driven weighting scheme for the constituents of the workflow composite score and to contrast this scheme with a literature based, top-down procedure. This approach should finally test the validity and robustness of the workflow composite score. Based on secondary data from 183 German hospitals, a tiered factor analytic approach (confirmatory and subsequent exploratory factor analysis) was pursued. A weighting scheme, which was based on factor loadings obtained in the analyses, was put into practice. We were able to identify five statistically significant factors of clinical information logistics that accounted for 63% of the overall variance. These factors were "flow of data and information", "mobility", "clinical decision support and patient safety", "electronic patient record" and "integration and distribution". The system of weights derived from the factor loadings resulted in values for the workflow composite score that differed only slightly from the score values that had been previously published based on a top-down approach. Our findings give insight into the internal composition of clinical information logistics both in terms of factors and weights. They also allowed us to propose a coherent model of clinical information logistics from a technical perspective that joins empirical findings with theoretical knowledge. Despite the new scheme of weights applied to the calculation of the workflow composite score, the score behaved robustly, which is yet another hint of its validity and therefore its usefulness. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kumar, A.; Ramanathan, A.; Mathukumalli, B. K. P.; Datta, D. K.
2014-12-01
The distribution, enrichment and ecotoxocity potential of Bangladesh part of Sundarban mangrove was investigated for eight trace metals (As, Cd, Cr, Cu, Fe, Mn, Pb and Zn) using sediment quality assessment indices. The average concentration of trace metals in the sediments exceeded the crustal abundance suggesting sources other than natural in origin. Additionally, the trace metals profile may be a reflection of socio-economic development in the vicinity of Sundarban which further attributes trace metals abundance to the anthropogenic inputs. Geoaccumulation index suggests moderately polluted sediment quality w.r.t. Ni and As and background concentrations for Al, Fe, Mn, Cu, Zn, Pb, Co, As and Cd. Contamination factor analysis suggested low contamination by Zn, Cr, Co and Cd, moderate by Fe, Mn, Cu and Pb while Ni and As show considerable and high contamination, respectively. Enrichment factors for Ni, Pb and As suggests high contamination from either biota or anthropogenic inputs besides natural enrichment. As per the three sediment quality guidelines, Fe, Mn, Cu, Ni, Co and As would be more of a concern with respect to ecotoxicological risk in the Sundarban mangroves. The correlation between various physiochemical variables and trace metals suggested significant role of fine grained particles (clay) in trace metal distribution whereas owing to low organic carbon content in the region the organic complexation may not be playing significant role in trace metal distribution in the Sundarban mangroves.
The biogeochemical distribution of trace elements in the Indian Ocean
NASA Astrophysics Data System (ADS)
Saager, Paul M.
1994-06-01
The present review deals with the distributions of dissolved trace metals in the Indian Ocean in relation with biological, chemical and hydrographic processes. The literature data-base is extremely limited and almost no information is available on particle processes and input and output processes of trace metals in the Indian Ocean basin and therefore much research is needed to expand our understanding of the marine chemistries of most trace metals. An area of special interest for future research is the Arabian Sea. The local conditions (upwelling induced productivity, restricted bottom water circulation and suboxic intermediate waters) create a natural laboratory for studying trace metal chemistry.
A novel approach to optimize workflow in grid-based teleradiology applications.
Yılmaz, Ayhan Ozan; Baykal, Nazife
2016-01-01
This study proposes an infrastructure with a reporting workflow optimization algorithm (RWOA) in order to interconnect facilities, reporting units and radiologists on a single access interface, to increase the efficiency of the reporting process by decreasing the medical report turnaround time and to increase the quality of medical reports by determining the optimum match between the inspection and radiologist in terms of subspecialty, workload and response time. Workflow centric network architecture with an enhanced caching, querying and retrieving mechanism is implemented by seamlessly integrating Grid Agent and Grid Manager to conventional digital radiology systems. The inspection and radiologist attributes are modelled using a hierarchical ontology structure. Attribute preferences rated by radiologists and technical experts are formed into reciprocal matrixes and weights for entities are calculated utilizing Analytic Hierarchy Process (AHP). The assignment alternatives are processed by relation-based semantic matching (RBSM) and Integer Linear Programming (ILP). The results are evaluated based on both real case applications and simulated process data in terms of subspecialty, response time and workload success rates. Results obtained using simulated data are compared with the outcomes obtained by applying Round Robin, Shortest Queue and Random distribution policies. The proposed algorithm is also applied to a real case teleradiology application process data where medical reporting workflow was performed based on manual assignments by the chief radiologist for 6225 inspections. RBSM gives the highest subspecialty success rate and integrating ILP with RBSM ratings as RWOA provides a better response time and workload distribution success rate. RWOA based image delivery also prevents bandwidth, storage or hardware related stuck and latencies. When compared with a real case teleradiology application where inspection assignments were performed manually, the proposed solution was found to increase the experience success rate by 13.25%, workload success rate by 63.76% and response time success rate by 120%. The total response time in the real case application data was improved by 22.39%. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
ASCEM Data Brower (ASCEMDB) v0.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
ROMOSAN, ALEXANDRU
Data management tool designed for the Advanced Simulation Capability for Environmental Management (ASCEM) framework. Distinguishing features of this gateway include: (1) handling of complex geometry data, (2) advance selection mechanism, (3) state of art rendering of spatiotemporal data records, and (4) seamless integration with a distributed workflow engine.
Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.
Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell
2011-07-26
Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-10-26
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m / z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation.
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-01-01
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m/z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation. PMID:28248242
NASA Astrophysics Data System (ADS)
Thomer, A.
2017-12-01
Data provenance - the record of the varied processes that went into the creation of a dataset, as well as the relationships between resulting data objects - is necessary to support the reusability, reproducibility and reliability of earth science data. In sUAS-based research, capturing provenance can be particularly challenging because of the breadth and distributed nature of the many platforms used to collect, process and analyze data. In any given project, multiple drones, controllers, computers, software systems, sensors, cameras, imaging processing algorithms and data processing workflows are used over sometimes long periods of time. These platforms and processing result in dozens - if not hundreds - of data products in varying stages of readiness-for-analysis and sharing. Provenance tracking mechanisms are needed to make the relationships between these many data products explicit, and therefore more reusable and shareable. In this talk, I discuss opportunities and challenges in tracking provenance in sUAS-based research, and identify gaps in current workflow-capture technologies. I draw on prior work conducted as part of the IMLS-funded Site-Based Data Curation project in which we developed methods of documenting in and ex silico (that is, computational and non-computation) workflows, and demonstrate this approaches applicability to research with sUASes. I conclude with a discussion of ontologies and other semantic technologies that have potential application in sUAS research.
Gold, Peter O.; Cowgill, Eric; Kreylos, Oliver; Gold, Ryan D.
2012-01-01
Three-dimensional (3D) slip vectors recorded by displaced landforms are difficult to constrain across complex fault zones, and the uncertainties associated with such measurements become increasingly challenging to assess as landforms degrade over time. We approach this problem from a remote sensing perspective by using terrestrial laser scanning (TLS) and 3D structural analysis. We have developed an integrated TLS data collection and point-based analysis workflow that incorporates accurate assessments of aleatoric and epistemic uncertainties using experimental surveys, Monte Carlo simulations, and iterative site reconstructions. Our scanning workflow and equipment requirements are optimized for single-operator surveying, and our data analysis process is largely completed using new point-based computing tools in an immersive 3D virtual reality environment. In a case study, we measured slip vector orientations at two sites along the rupture trace of the 1954 Dixie Valley earthquake (central Nevada, United States), yielding measurements that are the first direct constraints on the 3D slip vector for this event. These observations are consistent with a previous approximation of net extension direction for this event. We find that errors introduced by variables in our survey method result in <2.5 cm of variability in components of displacement, and are eclipsed by the 10–60 cm epistemic errors introduced by reconstructing the field sites to their pre-erosion geometries. Although the higher resolution TLS data sets enabled visualization and data interactivity critical for reconstructing the 3D slip vector and for assessing uncertainties, dense topographic constraints alone were not sufficient to significantly narrow the wide (<26°) range of allowable slip vector orientations that resulted from accounting for epistemic uncertainties.
The Gaussian Laser Angular Distribution in HYDRA's 3D Laser Ray Trace Package
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sepke, Scott M.
In this note, the angular distribution of rays launched by the 3D LZR ray trace package is derived for Gaussian beams (npower==2) with bm model=3±. Beams with bm model=+3 have a nearly at distribution, and beams with bm model=-3 have a nearly linear distribution when the spot size is large compared to the wavelength.
Pegasus Workflow Management System: Helping Applications From Earth and Space
NASA Astrophysics Data System (ADS)
Mehta, G.; Deelman, E.; Vahi, K.; Silva, F.
2010-12-01
Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources simultaneously. Pegasus WMS provides a means for representing the workflow of an application in an abstract XML form, agnostic of the resources available to run it and the location of data and executables. It then compiles these workflows into concrete plans by querying catalogs and farming computations across local and distributed computing resources, as well as emerging commercial and community cloud environments in an easy and reliable manner. Pegasus WMS optimizes the execution as well as data movement by leveraging existing Grid and cloud technologies via a flexible pluggable interface and provides advanced features like reusing existing data, automatic cleanup of generated data, and recursive workflows with deferred planning. It also captures all the provenance of the workflow from the planning stage to the execution of the generated data, helping scientists to accurately measure performance metrics of their workflow as well as data reproducibility issues. Pegasus WMS was initially developed as part of the GriPhyN project to support large-scale high-energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, bacterial RNA studies, helioseismology and ocean modeling. Earthquake Simulation: Pegasus WMS was recently used in a large scale production run in 2009 by the Southern California Earthquake Centre to run 192 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on National Cyber infrastructure for generating a probabilistic seismic hazard map of the Southern California region. SCEC ran 223 workflows over a period of eight weeks, using on average 4,420 cores, with a peak of 14,540 cores. A total of 192 million files were produced totaling about 165TB out of which 11TB of data was saved. Astrophysics: The Laser Interferometer Gravitational-Wave Observatory (LIGO) uses Pegasus WMS to search for binary inspiral gravitational waves. A month of LIGO data requires many thousands of jobs, running for days on hundreds of CPUs on the LIGO Data Grid (LDG) and Open Science Grid (OSG). Ocean Temperature Forecast: Researchers at the Jet Propulsion Laboratory are exploring Pegasus WMS to run ocean forecast ensembles of the California coastal region. These models produce a number of daily forecasts for water temperature, salinity, and other measures. Helioseismology: The Solar Dynamics Observatory (SDO) is NASA's most important solar physics mission of this coming decade. Pegasus WMS is being used to analyze the data from SDO, which will be predominantly used to learn about solar magnetic activity and to probe the internal structure and dynamics of the Sun with helioseismology. Bacterial RNA studies: SIPHT is an application in bacterial genomics, which predicts sRNA (small non-coding RNAs)-encoding genes in bacteria. This project currently provides a web-based interface using Pegasus WMS at the backend to facilitate large-scale execution of the workflows on varied resources and provide better notifications of task/workflow completion.
Motion/imagery secure cloud enterprise architecture analysis
NASA Astrophysics Data System (ADS)
DeLay, John L.
2012-06-01
Cloud computing with storage virtualization and new service-oriented architectures brings a new perspective to the aspect of a distributed motion imagery and persistent surveillance enterprise. Our existing research is focused mainly on content management, distributed analytics, WAN distributed cloud networking performance issues of cloud based technologies. The potential of leveraging cloud based technologies for hosting motion imagery, imagery and analytics workflows for DOD and security applications is relatively unexplored. This paper will examine technologies for managing, storing, processing and disseminating motion imagery and imagery within a distributed network environment. Finally, we propose areas for future research in the area of distributed cloud content management enterprises.
The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)
NASA Astrophysics Data System (ADS)
Stephan, E.
2010-12-01
The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual success is tied together, forming a symbiotic relationship where research and development advances in one effort can provide tremendous benefits to the other. For example, automating provenance extraction within scientific applications is still a relatively new concept; the workflow engine provides the framework to capture application specific operations, inputs, and resulting data. It provides a description of the process history and data flow by wrapping workflow components around the applications and data sources. On the other hand, a lack of cooperation between workflows and provenance can inhibit usefulness of both to science. Blindly tracking the execution history without having a true understanding of what kinds of questions end users may have makes the provenance indecipherable to the target users. Over the past nine years PNNL has been actively involved in provenance research in support of computational chemistry, molecular dynamics, biology, hydrology, and climate. PNNL has also been actively involved in efforts by the international community to develop open standards for provenance and the development of architectures to support provenance capture, storage, and querying. This presentation will provide real world use cases of how provenance and workflow can be leveraged and implemented to meet different needs and the challenges that lie ahead.
Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric
2018-01-01
Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
Handler, Michael; Schier, Peter P; Fritscher, Karl D; Raudaschl, Patrik; Johnson Chacko, Lejo; Glueckert, Rudolf; Saba, Rami; Schubert, Rainer; Baumgarten, Daniel; Baumgartner, Christian
2017-01-01
Our sense of balance and spatial orientation strongly depends on the correct functionality of our vestibular system. Vestibular dysfunction can lead to blurred vision and impaired balance and spatial orientation, causing a significant decrease in quality of life. Recent studies have shown that vestibular implants offer a possible treatment for patients with vestibular dysfunction. The close proximity of the vestibular nerve bundles, the facial nerve and the cochlear nerve poses a major challenge to targeted stimulation of the vestibular system. Modeling the electrical stimulation of the vestibular system allows for an efficient analysis of stimulation scenarios previous to time and cost intensive in vivo experiments. Current models are based on animal data or CAD models of human anatomy. In this work, a (semi-)automatic modular workflow is presented for the stepwise transformation of segmented vestibular anatomy data of human vestibular specimens to an electrical model and subsequently analyzed. The steps of this workflow include (i) the transformation of labeled datasets to a tetrahedra mesh, (ii) nerve fiber anisotropy and fiber computation as a basis for neuron models, (iii) inclusion of arbitrary electrode designs, (iv) simulation of quasistationary potential distributions, and (v) analysis of stimulus waveforms on the stimulation outcome. Results obtained by the workflow based on human datasets and the average shape of a statistical model revealed a high qualitative agreement and a quantitatively comparable range compared to data from literature, respectively. Based on our workflow, a detailed analysis of intra- and extra-labyrinthine electrode configurations with various stimulation waveforms and electrode designs can be performed on patient specific anatomy, making this framework a valuable tool for current optimization questions concerning vestibular implants in humans.
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
[Multimodal document management in radiotherapy].
Fahrner, H; Kirrmann, S; Röhner, F; Schmucker, M; Hall, M; Heinemann, F
2013-12-01
After incorporating treatment planning and the organisational model of treatment planning in the operating schedule system (BAS, "Betriebsablaufsystem"), complete document qualities were embedded in the digital environment. The aim of this project was to integrate all documents independent of their source (paper-bound or digital) and to make content from the BAS available in a structured manner. As many workflow steps as possible should be automated, e.g. assigning a document to a patient in the BAS. Additionally it must be guaranteed that at all times it could be traced who, when, how and from which source documents were imported into the departmental system. Furthermore work procedures should be changed that the documentation conducted either directly in the departmental system or from external systems can be incorporated digitally and paper document can be completely avoided (e.g. documents such as treatment certificate, treatment plans or documentation). It was a further aim, if possible, to automate the removal of paper documents from the departmental work flow, or even to make such paper documents superfluous. In this way patient letters for follow-up appointments should automatically generated from the BAS. Similarly patient record extracts in the form of PDF files should be enabled, e.g. for controlling purposes. The available document qualities were analysed in detail by a multidisciplinary working group (BAS-AG) and after this examination and assessment of the possibility of modelling in our departmental workflow (BAS) they were transcribed into a flow diagram. The gathered specifications were implemented in a test environment by the clinical and administrative IT group of the department of radiation oncology and subsequent to a detailed analysis introduced into clinical routine. The department has succeeded under the conditions of the aforementioned criteria to embed all relevant documents in the departmental workflow via continuous processes. Since the completion of the concepts and the implementation in our test environment 15,000 documents were introduced into the departmental workflow following routine approval. Furthermore approximately 5000 appointment letters for patient aftercare per year were automatically generated by the BAS. In addition patient record extracts in the form of PDF files for the medical services of the healthcare insurer can be generated.
NASA Astrophysics Data System (ADS)
Wiemker, Rafael; Sevenster, Merlijn; MacMahon, Heber; Li, Feng; Dalal, Sandeep; Tahmasebi, Amir; Klinder, Tobias
2017-03-01
The imaging biomarkers EmphysemaPresence and NoduleSpiculation are crucial inputs for most models aiming to predict the risk of indeterminate pulmonary nodules detected at CT screening. To increase reproducibility and to accelerate screening workflow it is desirable to assess these biomarkers automatically. Validation on NLST images indicates that standard histogram measures are not sufficient to assess EmphysemaPresence in screenees. However, automatic scoring of bulla-resembling low attenuation areas can achieve agreement with experts with close to 80% sensitivity and specificity. NoduleSpiculation can be automatically assessed with similar accuracy. We find a dedicated spiculi tracing score to slightly outperform generic combinations of texture features with classifiers.
Horsfall, M; Spiff, A I
2002-09-01
The distribution of trace metals in sediments of the lower reaches of the New Calabar River, Nigeria was evaluated together with the partitioning of their chemical species between five geochemical phases. Samplings were made in five zones at the lower reaches of the New Calaber River. All the trace metals were determined by AAS after selective chemical extractions and concentrations given in microg gm(-1) (dry weight basis). The average total concentrations found for trace metals in the sediment were ( mean +/- rsd.) Pb: 41.6 +/- 0.29, Zn: 31.60 +/- 0.42, Cd: 12.80 +/- 0.92, Co: 92 +/- 0.25, Cu: 25.5 +/- 0.65 and Ni: 3.2 +/- 0.25. Maxima and minima concentrations are inconsistent with previous studies in other rivers of this region. Spatial distribution revealed that the sources of trace metals into the river appeared to be of non-point. Five contamination indices were applied in studying the partitioning of the trace metals in the sediment. These indices provided bases for ascertaining the potential environmental risk of trace metals in the river system. The results denote high partition levels in the more mobile and more dangerous phases.
NASA Astrophysics Data System (ADS)
Gleber, Sophie-Charlotte; Weinhausen, Britta; Köster, Sarah; Ward, Jesse; Vine, David; Finney, Lydia; Vogt, Stefan
2013-10-01
The distribution, binding and release of trace elements on soil colloids determine matter transport through the soil matrix, and necessitates an aqueous environment and short length and time scales for their study. However, not many microscopy techniques allow for that. We previously showed hard x-ray fluorescence microscopy capabilities to image aqueous colloidal soil samples [1]. As this technique provides attogram sensitivity for transition elements like Cu, Zn, and other geochemically relevant trace elements at sub micrometer spatial resolution (currently down to 150 nm at 2-ID-E [2]; below 50nm at Bionanoprobe, cf. G.Woloschak et al, this volume) combined with the capability to penetrate tens of micrometer of water, it is ideally suited for imaging the elemental content of soil colloids. To address the question of binding and release processes of trace elements on the surface of soil colloids, we developed a microfluidics based XRF flow cytometer, and expanded the applied methods of hard x-ray fluorescence microscopy towards three dimensional imaging. Here, we show (a) the 2-D imaged distributions of Si, K and Fe on soil colloids of Pseudogley samples; (b) how the trace element distribution is a dynamic, pH-dependent process; and (c) x-ray tomographic applications to render the trace elemental distributions in 3-D. We conclude that the approach presented here shows the remarkable potential to image and quantitate elemental distributions from samles within their natural aqueous microenvironment, particularly important in the environmental, medical, and biological sciences.
Distributed information system architecture for Primary Health Care.
Grammatikou, M; Stamatelopoulos, F; Maglaris, B
2000-01-01
We present a distributed architectural framework for Primary Health Care (PHC) Centres. Distribution is handled through the introduction of the Roaming Electronic Health Care Record (R-EHCR) and the use of local caching and incremental update of a global index. The proposed architecture is designed to accommodate a specific PHC workflow model. Finally, we discuss a pilot implementation in progress, which is based on CORBA and web-based user interfaces. However, the conceptual architecture is generic and open to other middleware approaches like the DHE or HL7.
Combating Stovepipes: Implementing Workflow in uPortal
ERIC Educational Resources Information Center
Askren, Mark; Arseniev, Marina
2004-01-01
With a new wave of campus IT solutions comes a familiar challenge: How does one encourage and help build distributed applications without being overrun by stovepipe solutions? Anticipating significant enrollment growth (5 percent per year) for the next several years, double-digit annual growth in funded research, and little chance that resources…
Many watershed models simulate overland and instream microbial fate and transport, but few provide loading rates on land surfaces and point sources to the waterbody network. This paper describes the underlying equations for microbial loading rates associated with 1) land-applied ...
Desktop Publishing in the University: Current Progress, Future Visions.
ERIC Educational Resources Information Center
Smith, Thomas W.
1989-01-01
Discussion of the workflow involved in desktop publishing focuses on experiences at the College of Engineering at the University of Wisconsin at Madison. Highlights include cost savings and productivity gains in page layout and composition; editing, translation, and revision issues; printing and distribution; and benefits to the reader. (LRW)
USDA-ARS?s Scientific Manuscript database
Many watershed models simulate overland and instream microbial fate and transport, but few provide loading rates on land surfaces and point sources to the waterbody network. This paper describes the underlying equations for microbial loading rates associated with 1) land-applied manure on undevelope...
Lu, Xinyan
2016-01-01
There is a clear requirement for enhancing laboratory information management during early absorption, distribution, metabolism and excretion (ADME) screening. The application of a commercial laboratory information management system (LIMS) is limited by complexity, insufficient flexibility, high costs and extended timelines. An improved custom in-house LIMS for ADME screening was developed using Excel. All Excel templates were generated through macros and formulae, and information flow was streamlined as much as possible. This system has been successfully applied in task generation, process control and data management, with a reduction in both labor time and human error rates. An Excel-based LIMS can provide a simple, flexible and cost/time-saving solution for improving workflow efficiencies in early ADME screening.
Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi
As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less
Big Data Provenance: Challenges, State of the Art and Opportunities.
Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay
2015-01-01
Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.
Methodology for processing pressure traces used as inputs for combustion analyses in diesel engines
NASA Astrophysics Data System (ADS)
Rašić, Davor; Vihar, Rok; Žvar Baškovič, Urban; Katrašnik, Tomaž
2017-05-01
This study proposes a novel methodology for designing an optimum equiripple finite impulse response (FIR) filter for processing in-cylinder pressure traces of a diesel internal combustion engine, which serve as inputs for high-precision combustion analyses. The proposed automated workflow is based on an innovative approach of determining the transition band frequencies and optimum filter order. The methodology is based on discrete Fourier transform analysis, which is the first step to estimate the location of the pass-band and stop-band frequencies. The second step uses short-time Fourier transform analysis to refine the estimated aforementioned frequencies. These pass-band and stop-band frequencies are further used to determine the most appropriate FIR filter order. The most widely used existing methods for estimating the FIR filter order are not effective in suppressing the oscillations in the rate- of-heat-release (ROHR) trace, thus hindering the accuracy of combustion analyses. To address this problem, an innovative method for determining the order of an FIR filter is proposed in this study. This method is based on the minimization of the integral of normalized signal-to-noise differences between the stop-band frequency and the Nyquist frequency. Developed filters were validated using spectral analysis and calculation of the ROHR. The validation results showed that the filters designed using the proposed innovative method were superior compared with those using the existing methods for all analyzed cases. Highlights • Pressure traces of a diesel engine were processed by finite impulse response (FIR) filters with different orders • Transition band frequencies were determined with an innovative method based on discrete Fourier transform and short-time Fourier transform • Spectral analyses showed deficiencies of existing methods in determining the FIR filter order • A new method of determining the FIR filter order for processing pressure traces was proposed • The efficiency of the new method was demonstrated by spectral analyses and calculations of rate-of-heat-release traces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.
2011-07-04
A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less
DEWEY: the DICOM-enabled workflow engine system.
Erickson, Bradley J; Langer, Steve G; Blezek, Daniel J; Ryan, William J; French, Todd L
2014-06-01
Workflow is a widely used term to describe the sequence of steps to accomplish a task. The use of workflow technology in medicine and medical imaging in particular is limited. In this article, we describe the application of a workflow engine to improve workflow in a radiology department. We implemented a DICOM-enabled workflow engine system in our department. We designed it in a way to allow for scalability, reliability, and flexibility. We implemented several workflows, including one that replaced an existing manual workflow and measured the number of examinations prepared in time without and with the workflow system. The system significantly increased the number of examinations prepared in time for clinical review compared to human effort. It also met the design goals defined at its outset. Workflow engines appear to have value as ways to efficiently assure that complex workflows are completed in a timely fashion.
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.
Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa
2012-05-04
Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
Fernández-de-Manúel, Laura; Díaz-Díaz, Covadonga; Jiménez-Carretero, Daniel; Torres, Miguel; Montoya, María C
2017-05-01
Embryonic stem cells (ESCs) can be established as permanent cell lines, and their potential to differentiate into adult tissues has led to widespread use for studying the mechanisms and dynamics of stem cell differentiation and exploring strategies for tissue repair. Imaging live ESCs during development is now feasible due to advances in optical imaging and engineering of genetically encoded fluorescent reporters; however, a major limitation is the low spatio-temporal resolution of long-term 3-D imaging required for generational and neighboring reconstructions. Here, we present the ESC-Track (ESC-T) workflow, which includes an automated cell and nuclear segmentation and tracking tool for 4-D (3-D + time) confocal image data sets as well as a manual editing tool for visual inspection and error correction. ESC-T automatically identifies cell divisions and membrane contacts for lineage tree and neighborhood reconstruction and computes quantitative features from individual cell entities, enabling analysis of fluorescence signal dynamics and tracking of cell morphology and motion. We use ESC-T to examine Myc intensity fluctuations in the context of mouse ESC (mESC) lineage and neighborhood relationships. ESC-T is a powerful tool for evaluation of the genealogical and microenvironmental cues that maintain ESC fitness.
Lowering the Barrier to Reproducible Research by Publishing Provenance from Common Analytical Tools
NASA Astrophysics Data System (ADS)
Jones, M. B.; Slaughter, P.; Walker, L.; Jones, C. S.; Missier, P.; Ludäscher, B.; Cao, Y.; McPhillips, T.; Schildhauer, M.
2015-12-01
Scientific provenance describes the authenticity, origin, and processing history of research products and promotes scientific transparency by detailing the steps in computational workflows that produce derived products. These products include papers, findings, input data, software products to perform computations, and derived data and visualizations. The geosciences community values this type of information, and, at least theoretically, strives to base conclusions on computationally replicable findings. In practice, capturing detailed provenance is laborious and thus has been a low priority; beyond a lab notebook describing methods and results, few researchers capture and preserve detailed records of scientific provenance. We have built tools for capturing and publishing provenance that integrate into analytical environments that are in widespread use by geoscientists (R and Matlab). These tools lower the barrier to provenance generation by automating capture of critical information as researchers prepare data for analysis, develop, test, and execute models, and create visualizations. The 'recordr' library in R and the `matlab-dataone` library in Matlab provide shared functions to capture provenance with minimal changes to normal working procedures. Researchers can capture both scripted and interactive sessions, tag and manage these executions as they iterate over analyses, and then prune and publish provenance metadata and derived products to the DataONE federation of archival repositories. Provenance traces conform to the ProvONE model extension of W3C PROV, enabling interoperability across tools and languages. The capture system supports fine-grained versioning of science products and provenance traces. By assigning global identifiers such as DOIs, reseachers can cite the computational processes used to reach findings. And, finally, DataONE has built a web portal to search, browse, and clearly display provenance relationships between input data, the software used to execute analyses and models, and derived data and products that arise from these computations. This provenance is vital to interpretation and understanding of science, and provides an audit trail that researchers can use to understand and replicate computational workflows in the geosciences.
Airborne measurements of NO, NO2, and NO(sub y) as related to NASA's TRACE-A field program
NASA Technical Reports Server (NTRS)
Bradshaw, John; Sandholm, Scott
1995-01-01
The Georgia Tech group's effort on NASA's GTE program and TRACE-A field mission primarily involved analysis and interpretation of the measurement data base obtained during the TRACE-A field campaign. These investigations focused on the distribution of ozone and ozone precursors over the south Atlantic and nearby continental regions of Africa and South Africa. The Transport and Atmospheric Chemistry near the Equator-Atlantic (TRACE-A) Mission was designed with the goal of investigating tropospheric trace gas distributions, sources, and photochemical state over the southern Atlantic. Major scientific issues related to N(x)O(y) tropospheric chemistry addressed in this program included: (1) what controls the tropospheric ozone budget over the southern Atlantic? (2) What are the spatial distributions of CO, CO2, NO, NO2, NO(sub y), O3, NMHC, H2O3, etc. over the southern Atlantic? (3) How does long range transport of long-lived NO(y) compounds affect the more reactive NO(x) budget in southern Atlantic troposphere?
Distribution of trace elements in the coastal sea sediments of Maslinica Bay, Croatia
NASA Astrophysics Data System (ADS)
Mikulic, Nenad; Orescanin, Visnja; Elez, Loris; Pavicic, Ljiljana; Pezelj, Durdica; Lovrencic, Ivanka; Lulic, Stipe
2008-02-01
Spatial distributions of trace elements in the coastal sea sediments and water of Maslinica Bay (Southern Adriatic), Croatia and possible changes in marine flora and foraminifera communities due to pollution were investigated. Macro, micro and trace elements’ distributions in five granulometric fractions were determined for each sediment sample. Bulk sediment samples were also subjected to leaching tests. Elemental concentrations in sediments, sediment extracts and seawater were measured by source excited energy dispersive X-ray fluorescence (EDXRF). Concentrations of the elements Cr, Cu, Zn, and Pb in bulk sediment samples taken in the Maslinica Bay were from 2.1 to over six times enriched when compared with the background level determined for coarse grained carbonate sediments. A low degree of trace elements leaching determined for bulk sediments pointed to strong bonding of trace elements to sediment mineral phases. The analyses of marine flora pointed to higher eutrophication, which disturbs the balance between communities and natural habitats.
Inferring Clinical Workflow Efficiency via Electronic Medical Record Utilization
Chen, You; Xie, Wei; Gunter, Carl A; Liebovitz, David; Mehrotra, Sanjay; Zhang, He; Malin, Bradley
2015-01-01
Complexity in clinical workflows can lead to inefficiency in making diagnoses, ineffectiveness of treatment plans and uninformed management of healthcare organizations (HCOs). Traditional strategies to manage workflow complexity are based on measuring the gaps between workflows defined by HCO administrators and the actual processes followed by staff in the clinic. However, existing methods tend to neglect the influences of EMR systems on the utilization of workflows, which could be leveraged to optimize workflows facilitated through the EMR. In this paper, we introduce a framework to infer clinical workflows through the utilization of an EMR and show how such workflows roughly partition into four types according to their efficiency. Our framework infers workflows at several levels of granularity through data mining technologies. We study four months of EMR event logs from a large medical center, including 16,569 inpatient stays, and illustrate that over approximately 95% of workflows are efficient and that 80% of patients are on such workflows. At the same time, we show that the remaining 5% of workflows may be inefficient due to a variety of factors, such as complex patients. PMID:26958173
Workflow management systems in radiology
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim
1998-07-01
In a situation of shrinking health care budgets, increasing cost pressure and growing demands to increase the efficiency and the quality of medical services, health care enterprises are forced to optimize or complete re-design their processes. Although information technology is agreed to potentially contribute to cost reduction and efficiency improvement, the real success factors are the re-definition and automation of processes: Business Process Re-engineering and Workflow Management. In this paper we discuss architectures for the use of workflow management systems in radiology. We propose to move forward from information systems in radiology (RIS, PACS) to Radiology Management Systems, in which workflow functionality (process definitions and process automation) is implemented through autonomous workflow management systems (WfMS). In a workflow oriented architecture, an autonomous workflow enactment service communicates with workflow client applications via standardized interfaces. In this paper, we discuss the need for and the benefits of such an approach. The separation of workflow management system and application systems is emphasized, and the consequences that arise for the architecture of workflow oriented information systems. This includes an appropriate workflow terminology, and the definition of standard interfaces for workflow aware application systems. Workflow studies in various institutions have shown that most of the processes in radiology are well structured and suited for a workflow management approach. Numerous commercially available Workflow Management Systems (WfMS) were investigated, and some of them, which are process- oriented and application independent, appear suitable for use in radiology.
speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification.
Beirnaert, Charlie; Meysman, Pieter; Vu, Trung Nghia; Hermans, Nina; Apers, Sandra; Pieters, Luc; Covaci, Adrian; Laukens, Kris
2018-03-01
Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).
speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification
Pieters, Luc; Covaci, Adrian
2018-01-01
Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq). PMID:29494588
A data distributed parallel algorithm for ray-traced volume rendering
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Painter, James S.; Hansen, Charles D.; Krogh, Michael F.
1993-01-01
This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local ray tracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on both the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
Redesign of Library Workflows: Experimental Models for Electronic Resource Description.
ERIC Educational Resources Information Center
Calhoun, Karen
This paper explores the potential for and progress of a gradual transition from a highly centralized model for cataloging to an iterative, collaborative, and broadly distributed model for electronic resource description. The purpose is to alert library managers to some experiments underway and to help them conceptualize new methods for defining,…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, Brian W; Brunhart-Lupo, Nicholas J; Gruchalla, Kenny M
This brochure describes a system dynamics simulation (SD) framework that supports an end-to-end analysis workflow that is optimized for deployment on ESIF facilities(Peregrine and the Insight Center). It includes (I) parallel and distributed simulation of SD models, (ii) real-time 3D visualization of running simulations, and (iii) comprehensive database-oriented persistence of simulation metadata, inputs, and outputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, Brian W; Brunhart-Lupo, Nicholas J; Gruchalla, Kenny M
This presentation describes a system dynamics simulation (SD) framework that supports an end-to-end analysis workflow that is optimized for deployment on ESIF facilities(Peregrine and the Insight Center). It includes (I) parallel and distributed simulation of SD models, (ii) real-time 3D visualization of running simulations, and (iii) comprehensive database-oriented persistence of simulation metadata, inputs, and outputs.
Bioinformatics workflows and web services in systems biology made easy for experimentalists.
Jimenez, Rafael C; Corpas, Manuel
2013-01-01
Workflows are useful to perform data analysis and integration in systems biology. Workflow management systems can help users create workflows without any previous knowledge in programming and web services. However the computational skills required to build such workflows are usually above the level most biological experimentalists are comfortable with. In this chapter we introduce workflow management systems that reuse existing workflows instead of creating them, making it easier for experimentalists to perform computational tasks.
The VERCE platform: Enabling Computational Seismology via Streaming Workflows and Science Gateways
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Filgueira, Rosa; Krause, Amrey; Matser, Jonas; Casarotti, Emanuele; Magnoni, Federica; Gemund, Andre; Frobert, Laurent; Krischer, Lion; Atkinson, Malcolm
2015-04-01
The VERCE project is creating an e-Science platform to facilitate innovative data analysis and coding methods that fully exploit the wealth of data in global seismology. One of the technologies developed within the project is the Dispel4Py python library, which allows to describe abstract stream-based workflows for data-intensive applications and to execute them in a distributed environment. At runtime Dispel4Py is able to map workflow descriptions dynamically onto a number of computational resources (Apache Storm clusters, MPI powered clusters, and shared-memory multi-core machines, single-core machines), setting it apart from other workflow frameworks. Therefore, Dispel4Py enables scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Among the workflows developed with Dispel4Py in VERCE, we mention here those for Seismic Ambient Noise Cross-Correlation and MISFIT calculation, which address two data-intensive problems that are common in computational seismology. The former, also called Passive Imaging, allows the detection of relative seismic-wave velocity variations during the time of recording, to be associated with the stress-field changes that occurred in the test area. The MISFIT instead, takes as input the synthetic seismograms generated from HPC simulations for a certain Earth model and earthquake and, after a preprocessing stage, compares them with real observations in order to foster subsequent model updates and improvement (Inversion). The VERCE Science Gateway exposes the MISFIT calculation workflow as a service, in combination with the simulation phase. Both phases can be configured, controlled and monitored by the user via a rich user interface which is integrated within the gUSE Science Gateway framework, hiding the complexity of accessing third parties data services, security mechanisms and enactment on the target resources. Thanks to a modular extension to the Dispel4Py framework, the system collects provenance data adopting the W3C-PROV data model. Provenance recordings can be explored and analysed at run time for rapid diagnostic and workflow steering, or later for further validation and comparisons across runs. We will illustrate the interactive services of the gateway and the capabilities of the produced metadata, coupled with the VERCE data management layer based on iRODS. The Cross-Correlation workflow was evaluated on SuperMUC, a supercomputing cluster at the Leibniz Supercomputing Centre in Munich, with 155,656 processor cores in 9400 compute nodes. SuperMUC is based on the Intel Xeon architecture consisting of 18 Thin Node Islands and one Fat Node Island. This work has only had access to the Thin Node Islands, which contain Sandy Bridge nodes, each having 16 cores and 32 GB of memory. In the evaluations we used 1000 stations, and we applied two types of methods (whiten and non-whiten) for pre-processing the data. The workflow was tested on a varying number of cores (16, 32, 64, 128, and 256 cores) using the MPI mapping of Dispel4Py. The results show that Dispel4Py is able to improve the performance by increasing the number of cores without changing the description of the workflow.
A workflow learning model to improve geovisual analytics utility
Roth, Robert E; MacEachren, Alan M; McCabe, Craig A
2011-01-01
Introduction This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. Objectives The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. Methodology The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. Results/Conclusions In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009. PMID:21983545
A workflow learning model to improve geovisual analytics utility.
Roth, Robert E; Maceachren, Alan M; McCabe, Craig A
2009-01-01
INTRODUCTION: This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner. OBJECTIVES: The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user. METHODOLOGY: The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach. RESULTS/CONCLUSIONS: In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.
Hardisty, Alex R; Bacall, Finn; Beard, Niall; Balcázar-Vargas, Maria-Paula; Balech, Bachir; Barcza, Zoltán; Bourlat, Sarah J; De Giovanni, Renato; de Jong, Yde; De Leo, Francesca; Dobor, Laura; Donvito, Giacinto; Fellows, Donal; Guerra, Antonio Fernandez; Ferreira, Nuno; Fetyukova, Yuliya; Fosso, Bruno; Giddy, Jonathan; Goble, Carole; Güntsch, Anton; Haines, Robert; Ernst, Vera Hernández; Hettling, Hannes; Hidy, Dóra; Horváth, Ferenc; Ittzés, Dóra; Ittzés, Péter; Jones, Andrew; Kottmann, Renzo; Kulawik, Robert; Leidenberger, Sonja; Lyytikäinen-Saarenmaa, Päivi; Mathew, Cherian; Morrison, Norman; Nenadic, Aleksandra; de la Hidalga, Abraham Nieva; Obst, Matthias; Oostermeijer, Gerard; Paymal, Elisabeth; Pesole, Graziano; Pinto, Salvatore; Poigné, Axel; Fernandez, Francisco Quevedo; Santamaria, Monica; Saarenmaa, Hannu; Sipos, Gergely; Sylla, Karl-Heinz; Tähtinen, Marko; Vicario, Saverio; Vos, Rutger Aldo; Williams, Alan R; Yilmaz, Pelin
2016-10-20
Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.
Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support
2012-01-01
Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org. PMID:22559942
Cloern, James E.; Cole, Brian E.; Caffrey, J.M.
1996-01-01
In this report, we focus on selection of an “optimum” station configuration for the channel of San Francisco Bay for vertical profiling of water quality. Our analysis is based on the monthly cruises conducted by the USGS under the auspices of the Regional Monitoring Program for Trace Substances (Caffrey et al. 1994; SFEI 1994). The underlying rationale for undertaking the analysis is that the distribution of trace substances is structured, at least in part, by the same forces acting on water quality parameters. This must be true to some extent, as trace substance concentrations are partially dependent on water quality characteristics such as salinity. On the other hand, the quantitative importance of these parameters in accounting for overall variability in individual trace substances is unknown. Furthermore, trace substances have their own unique sources, and these sources may dominate their distribution.
SoS Notebook: An Interactive Multi-Language Data Analysis Environment.
Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N
2018-05-22
Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.
Big Data Provenance: Challenges, State of the Art and Opportunities
Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay
2017-01-01
Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data. PMID:29399671
Kwf-Grid workflow management system for Earth science applications
NASA Astrophysics Data System (ADS)
Tran, V.; Hluchy, L.
2009-04-01
In this paper, we present workflow management tool for Earth science applications in EGEE. The workflow management tool was originally developed within K-wf Grid project for GT4 middleware and has many advanced features like semi-automatic workflow composition, user-friendly GUI for managing workflows, knowledge management. In EGEE, we are porting the workflow management tool to gLite middleware for Earth science applications K-wf Grid workflow management system was developed within "Knowledge-based Workflow System for Grid Applications" under the 6th Framework Programme. The workflow mangement system intended to - semi-automatically compose a workflow of Grid services, - execute the composed workflow application in a Grid computing environment, - monitor the performance of the Grid infrastructure and the Grid applications, - analyze the resulting monitoring information, - capture the knowledge that is contained in the information by means of intelligent agents, - and finally to reuse the joined knowledge gathered from all participating users in a collaborative way in order to efficiently construct workflows for new Grid applications. Kwf Grid workflow engines can support different types of jobs (e.g. GRAM job, web services) in a workflow. New class of gLite job has been added to the system, allows system to manage and execute gLite jobs in EGEE infrastructure. The GUI has been adapted to the requirements of EGEE users, new credential management servlet is added to portal. Porting K-wf Grid workflow management system to gLite would allow EGEE users to use the system and benefit from its avanced features. The system is primarly tested and evaluated with applications from ES clusters.
Sampling solution traces for the problem of sorting permutations by signed reversals
2012-01-01
Background Traditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete. Results We propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements. Conclusions We analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces. PMID:22704580
Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Zamanyan, Alen; Torri, Federica; Macciardi, Fabio; Hobel, Sam; Moon, Seok Woo; Sung, Young Hee; Jiang, Zhiguo; Labus, Jennifer; Kurth, Florian; Ashe-McNalley, Cody; Mayer, Emeran; Vespa, Paul M.; Van Horn, John D.; Toga, Arthur W.
2013-01-01
The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data. PMID:23975276
DOE Office of Scientific and Technical Information (OSTI.GOV)
Copps, Kevin D.
The Sandia Analysis Workbench (SAW) project has developed and deployed a production capability for SIERRA computational mechanics analysis workflows. However, the electrical analysis workflow capability requirements have only been demonstrated in early prototype states, with no real capability deployed for analysts’ use. This milestone aims to improve the electrical analysis workflow capability (via SAW and related tools) and deploy it for ongoing use. We propose to focus on a QASPR electrical analysis calibration workflow use case. We will include a number of new capabilities (versus today’s SAW), such as: 1) support for the XYCE code workflow component, 2) data managementmore » coupled to electrical workflow, 3) human-in-theloop workflow capability, and 4) electrical analysis workflow capability deployed on the restricted (and possibly classified) network at Sandia. While far from the complete set of capabilities required for electrical analysis workflow over the long term, this is a substantial first step toward full production support for the electrical analysts.« less
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.
MIDG-Emerging grid technologies for multi-site preclinical molecular imaging research communities.
Lee, Jasper; Documet, Jorge; Liu, Brent; Park, Ryan; Tank, Archana; Huang, H K
2011-03-01
Molecular imaging is the visualization and identification of specific molecules in anatomy for insight into metabolic pathways, tissue consistency, and tracing of solute transport mechanisms. This paper presents the Molecular Imaging Data Grid (MIDG) which utilizes emerging grid technologies in preclinical molecular imaging to facilitate data sharing and discovery between preclinical molecular imaging facilities and their collaborating investigator institutions to expedite translational sciences research. Grid-enabled archiving, management, and distribution of animal-model imaging datasets help preclinical investigators to monitor, access and share their imaging data remotely, and promote preclinical imaging facilities to share published imaging datasets as resources for new investigators. The system architecture of the Molecular Imaging Data Grid is described in a four layer diagram. A data model for preclinical molecular imaging datasets is also presented based on imaging modalities currently used in a molecular imaging center. The MIDG system components and connectivity are presented. And finally, the workflow steps for grid-based archiving, management, and retrieval of preclincial molecular imaging data are described. Initial performance tests of the Molecular Imaging Data Grid system have been conducted at the USC IPILab using dedicated VMware servers. System connectivity, evaluated datasets, and preliminary results are presented. The results show the system's feasibility, limitations, direction of future research. Translational and interdisciplinary research in medicine is increasingly interested in cellular and molecular biology activity at the preclinical levels, utilizing molecular imaging methods on animal models. The task of integrated archiving, management, and distribution of these preclinical molecular imaging datasets at preclinical molecular imaging facilities is challenging due to disparate imaging systems and multiple off-site investigators. A Molecular Imaging Data Grid design, implementation, and initial evaluation is presented to demonstrate the secure and novel data grid solution for sharing preclinical molecular imaging data across the wide-area-network (WAN).
Systems engineering implementation in the preliminary design phase of the Giant Magellan Telescope
NASA Astrophysics Data System (ADS)
Maiten, J.; Johns, M.; Trancho, G.; Sawyer, D.; Mady, P.
2012-09-01
Like many telescope projects today, the 24.5-meter Giant Magellan Telescope (GMT) is truly a complex system. The primary and secondary mirrors of the GMT are segmented and actuated to support two operating modes: natural seeing and adaptive optics. GMT is a general-purpose telescope supporting multiple science instruments operated in those modes. GMT is a large, diverse collaboration and development includes geographically distributed teams. The need to implement good systems engineering processes for managing the development of systems like GMT becomes imperative. The management of the requirements flow down from the science requirements to the component level requirements is an inherently difficult task in itself. The interfaces must also be negotiated so that the interactions between subsystems and assemblies are well defined and controlled. This paper will provide an overview of the systems engineering processes and tools implemented for the GMT project during the preliminary design phase. This will include requirements management, documentation and configuration control, interface development and technical risk management. Because of the complexity of the GMT system and the distributed team, using web-accessible tools for collaboration is vital. To accomplish this GMTO has selected three tools: Cognition Cockpit, Xerox Docushare, and Solidworks Enterprise Product Data Management (EPDM). Key to this is the use of Cockpit for managing and documenting the product tree, architecture, error budget, requirements, interfaces, and risks. Additionally, drawing management is accomplished using an EPDM vault. Docushare, a documentation and configuration management tool is used to manage workflow of documents and drawings for the GMT project. These tools electronically facilitate collaboration in real time, enabling the GMT team to track, trace and report on key project metrics and design parameters.
Bonastre, Alberto; Ors, Rafael
2017-01-01
Monitoring is one of the best ways to evaluate the behavior of computer systems. When the monitored system is a distributed system—such as a wireless sensor network (WSN)—the monitoring operation must also be distributed, providing a distributed trace for further analysis. The temporal sequence of occurrence of the events registered by the distributed monitoring platform (DMP) must be correctly established to provide cause-effect relationships between them, so the logs obtained in different monitor nodes must be synchronized. Many of synchronization mechanisms applied to DMPs consist in adjusting the internal clocks of the nodes to the same value as a reference time. However, these mechanisms can create an incoherent event sequence. This article presents a new method to achieve global synchronization of the traces obtained in a DMP. It is based on periodic synchronization signals that are received by the monitor nodes and logged along with the recorded events. This mechanism processes all traces and generates a global post-synchronized trace by scaling all times registered proportionally according with the synchronization signals. It is intended to be a simple but efficient offline mechanism. Its application in a WSN-DMP demonstrates that it guarantees a correct ordering of the events, avoiding the aforementioned issues. PMID:29295494
Navia, Marlon; Campelo, José Carlos; Bonastre, Alberto; Ors, Rafael
2017-12-23
Monitoring is one of the best ways to evaluate the behavior of computer systems. When the monitored system is a distributed system-such as a wireless sensor network (WSN)-the monitoring operation must also be distributed, providing a distributed trace for further analysis. The temporal sequence of occurrence of the events registered by the distributed monitoring platform (DMP) must be correctly established to provide cause-effect relationships between them, so the logs obtained in different monitor nodes must be synchronized. Many of synchronization mechanisms applied to DMPs consist in adjusting the internal clocks of the nodes to the same value as a reference time. However, these mechanisms can create an incoherent event sequence. This article presents a new method to achieve global synchronization of the traces obtained in a DMP. It is based on periodic synchronization signals that are received by the monitor nodes and logged along with the recorded events. This mechanism processes all traces and generates a global post-synchronized trace by scaling all times registered proportionally according with the synchronization signals. It is intended to be a simple but efficient offline mechanism. Its application in a WSN-DMP demonstrates that it guarantees a correct ordering of the events, avoiding the aforementioned issues.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Veiga, Catarina; Janssens, Guillaume; Teng, Ching-Ling
2016-05-01
Purpose: An adaptive proton therapy workflow using cone beam computed tomography (CBCT) is proposed. It consists of an online evaluation of a fast range-corrected dose distribution based on a virtual CT (vCT) scan. This can be followed by more accurate offline dose recalculation on the vCT scan, which can trigger a rescan CT (rCT) for replanning. Methods and Materials: The workflow was tested retrospectively for 20 consecutive lung cancer patients. A diffeomorphic Morphon algorithm was used to generate the lung vCT by deforming the average planning CT onto the CBCT scan. An additional correction step was applied to account formore » anatomic modifications that cannot be modeled by deformation alone. A set of clinical indicators for replanning were generated according to the water equivalent thickness (WET) and dose statistics and compared with those obtained on the rCT scan. The fast dose approximation consisted of warping the initial planned dose onto the vCT scan according to the changes in WET. The potential under- and over-ranges were assessed as a variation in WET at the target's distal surface. Results: The range-corrected dose from the vCT scan reproduced clinical indicators similar to those of the rCT scan. The workflow performed well under different clinical scenarios, including atelectasis, lung reinflation, and different types of tumor response. Between the vCT and rCT scans, we found a difference in the measured 95% percentile of the over-range distribution of 3.4 ± 2.7 mm. The limitations of the technique consisted of inherent uncertainties in deformable registration and the drawbacks of CBCT imaging. The correction step was adequate when gross errors occurred but could not recover subtle anatomic or density changes in tumors with complex topology. Conclusions: A proton therapy workflow based on CBCT provided clinical indicators similar to those using rCT for patients with lung cancer with considerable anatomic changes.« less
Selenium deficiency risk predicted to increase under future climate change
Jones, Gerrad D.; Droz, Boris; Greve, Peter; Gottschalk, Pia; Poffet, Deyan; McGrath, Steve P.; Seneviratne, Sonia I.; Smith, Pete; Winkel, Lenny H. E.
2017-01-01
Deficiencies of micronutrients, including essential trace elements, affect up to 3 billion people worldwide. The dietary availability of trace elements is determined largely by their soil concentrations. Until now, the mechanisms governing soil concentrations have been evaluated in small-scale studies, which identify soil physicochemical properties as governing variables. However, global concentrations of trace elements and the factors controlling their distributions are virtually unknown. We used 33,241 soil data points to model recent (1980–1999) global distributions of Selenium (Se), an essential trace element that is required for humans. Worldwide, up to one in seven people have been estimated to have low dietary Se intake. Contrary to small-scale studies, soil Se concentrations were dominated by climate–soil interactions. Using moderate climate-change scenarios for 2080–2099, we predicted that changes in climate and soil organic carbon content will lead to overall decreased soil Se concentrations, particularly in agricultural areas; these decreases could increase the prevalence of Se deficiency. The importance of climate–soil interactions to Se distributions suggests that other trace elements with similar retention mechanisms will be similarly affected by climate change. PMID:28223487
Selenium deficiency risk predicted to increase under future climate change.
Jones, Gerrad D; Droz, Boris; Greve, Peter; Gottschalk, Pia; Poffet, Deyan; McGrath, Steve P; Seneviratne, Sonia I; Smith, Pete; Winkel, Lenny H E
2017-03-14
Deficiencies of micronutrients, including essential trace elements, affect up to 3 billion people worldwide. The dietary availability of trace elements is determined largely by their soil concentrations. Until now, the mechanisms governing soil concentrations have been evaluated in small-scale studies, which identify soil physicochemical properties as governing variables. However, global concentrations of trace elements and the factors controlling their distributions are virtually unknown. We used 33,241 soil data points to model recent (1980-1999) global distributions of Selenium (Se), an essential trace element that is required for humans. Worldwide, up to one in seven people have been estimated to have low dietary Se intake. Contrary to small-scale studies, soil Se concentrations were dominated by climate-soil interactions. Using moderate climate-change scenarios for 2080-2099, we predicted that changes in climate and soil organic carbon content will lead to overall decreased soil Se concentrations, particularly in agricultural areas; these decreases could increase the prevalence of Se deficiency. The importance of climate-soil interactions to Se distributions suggests that other trace elements with similar retention mechanisms will be similarly affected by climate change.
FAST COGNITIVE AND TASK ORIENTED, ITERATIVE DATA DISPLAY (FACTOID)
2017-06-01
approaches. As a result, the following assumptions guided our efforts in developing modeling and descriptive metrics for evaluation purposes...Application Evaluation . Our analytic workflow for evaluation is to first provide descriptive statistics about applications across metrics (performance...distributions for evaluation purposes because the goal of evaluation is accurate description , not inference (e.g., prediction). Outliers depicted
ERIC Educational Resources Information Center
Friedrich, Jon M.
2014-01-01
Engaging freshman and sophomore students in meaningful scientific research is challenging because of their developing skill set and their necessary time commitments to regular classwork. A project called the Chondrule Analysis Project was initiated to engage first- and second-year students in an initial research experience and also accomplish…
Millard, Pierre; Massou, Stéphane; Portais, Jean-Charles; Létisse, Fabien
2014-10-21
Mass spectrometry (MS) is widely used for isotopic studies of metabolism in which detailed information about biochemical processes is obtained from the analysis of isotope incorporation into metabolites. The biological value of such experiments is dependent on the accuracy of the isotopic measurements. Using MS, isotopologue distributions are measured from the quantitative analysis of isotopic clusters. These measurements are prone to various biases, which can occur during the experimental workflow and/or MS analysis. The lack of relevant standards limits investigations of the quality of the measured isotopologue distributions. To meet that need, we developed a complete theoretical and experimental framework for the biological production of metabolites with fully controlled and predictable labeling patterns. This strategy is valid for different isotopes and different types of metabolisms and organisms, and was applied to two model microorganisms, Pichia augusta and Escherichia coli, cultivated on (13)C-labeled methanol and acetate as sole carbon source, respectively. The isotopic composition of the substrates was designed to obtain samples in which the isotopologue distribution of all the metabolites should give the binomial coefficients found in Pascal's triangle. The strategy was validated on a liquid chromatography-tandem mass spectrometry (LC-MS/MS) platform by quantifying the complete isotopologue distributions of different intracellular metabolites, which were in close agreement with predictions. This strategy can be used to evaluate entire experimental workflows (from sampling to data processing) or different analytical platforms in the context of isotope labeling experiments.
Scientific Data Management (SDM) Center for Enabling Technologies. 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Over the past five years, our activities have both established Kepler as a viable scientific workflow environment and demonstrated its value across multiple science applications. We have published numerous peer-reviewed papers on the technologies highlighted in this short paper and have given Kepler tutorials at SC06,SC07,SC08,and SciDAC 2007. Our outreach activities have allowed scientists to learn best practices and better utilize Kepler to address their individual workflow problems. Our contributions to advancing the state-of-the-art in scientific workflows have focused on the following areas. Progress in each of these areas is described in subsequent sections. Workflow development. The development of amore » deeper understanding of scientific workflows "in the wild" and of the requirements for support tools that allow easy construction of complex scientific workflows; Generic workflow components and templates. The development of generic actors (i.e.workflow components and processes) which can be broadly applied to scientific problems; Provenance collection and analysis. The design of a flexible provenance collection and analysis infrastructure within the workflow environment; and, Workflow reliability and fault tolerance. The improvement of the reliability and fault-tolerance of workflow environments.« less
JTSA: an open source framework for time series abstractions.
Sacchi, Lucia; Capozzi, Davide; Bellazzi, Riccardo; Larizza, Cristiana
2015-10-01
The evaluation of the clinical status of a patient is frequently based on the temporal evolution of some parameters, making the detection of temporal patterns a priority in data analysis. Temporal abstraction (TA) is a methodology widely used in medical reasoning for summarizing and abstracting longitudinal data. This paper describes JTSA (Java Time Series Abstractor), a framework including a library of algorithms for time series preprocessing and abstraction and an engine to execute a workflow for temporal data processing. The JTSA framework is grounded on a comprehensive ontology that models temporal data processing both from the data storage and the abstraction computation perspective. The JTSA framework is designed to allow users to build their own analysis workflows by combining different algorithms. Thanks to the modular structure of a workflow, simple to highly complex patterns can be detected. The JTSA framework has been developed in Java 1.7 and is distributed under GPL as a jar file. JTSA provides: a collection of algorithms to perform temporal abstraction and preprocessing of time series, a framework for defining and executing data analysis workflows based on these algorithms, and a GUI for workflow prototyping and testing. The whole JTSA project relies on a formal model of the data types and of the algorithms included in the library. This model is the basis for the design and implementation of the software application. Taking into account this formalized structure, the user can easily extend the JTSA framework by adding new algorithms. Results are shown in the context of the EU project MOSAIC to extract relevant patterns from data coming related to the long term monitoring of diabetic patients. The proof that JTSA is a versatile tool to be adapted to different needs is given by its possible uses, both as a standalone tool for data summarization and as a module to be embedded into other architectures to select specific phenotypes based on TAs in a large dataset. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Total-reflection X-ray fluorescence studies of trace elements in biomedical samples
NASA Astrophysics Data System (ADS)
Kubala-Kukuś, A.; Braziewicz, J.; Pajek, M.
2004-08-01
Application of the total-reflection X-ray fluorescence (TXRF) analysis in the studies of trace element contents in biomedical samples is discussed in the following aspects: (i) a nature of trace element concentration distributions, (ii) censoring approach to the detection limits, and (iii) a comparison of two sets of censored data. The paper summarizes the recent results achieved in this topics, in particular, the lognormal, or more general logstable, nature of concentration distribution of trace elements, the random left-censoring and the Kaplan-Meier approach accounting for detection limits and, finally, the application of the logrank test to compare the censored concentrations measured for two groups. These new aspects, which are of importance for applications of the TXRF in different fields, are discussed here in the context of TXRF studies of trace element in various samples of medical interest.
Haston, Elspeth; Cubey, Robert; Pullan, Martin; Atkins, Hannah; Harris, David J
2012-01-01
Abstract Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow. The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow. PMID:22859881
Ethylene Trace-gas Techniques for High-speed Flows
NASA Technical Reports Server (NTRS)
Davis, David O.; Reichert, Bruce A.
1994-01-01
Three applications of the ethylene trace-gas technique to high-speed flows are described: flow-field tracking, air-to-air mixing, and bleed mass-flow measurement. The technique involves injecting a non-reacting gas (ethylene) into the flow field and measuring the concentration distribution in a downstream plane. From the distributions, information about flow development, mixing, and mass-flow rates can be dtermined. The trace-gas apparatus and special considerations for use in high-speed flow are discussed. A description of each application, including uncertainty estimates is followed by a demonstrative example.
Qin, Yuan; Michalowski, Andreas; Weber, Rudolf; Yang, Sen; Graf, Thomas; Ni, Xiaowu
2012-11-19
Ray-tracing is the commonly used technique to calculate the absorption of light in laser deep-penetration welding or drilling. Since new lasers with high brilliance enable small capillaries with high aspect ratios, diffraction might become important. To examine the applicability of the ray-tracing method, we studied the total absorptance and the absorbed intensity of polarized beams in several capillary geometries. The ray-tracing results are compared with more sophisticated simulations based on physical optics. The comparison shows that the simple ray-tracing is applicable to calculate the total absorptance in triangular grooves and in conical capillaries but not in rectangular grooves. To calculate the distribution of the absorbed intensity ray-tracing fails due to the neglected interference, diffraction, and the effects of beam propagation in the capillaries with sub-wavelength diameter. If diffraction is avoided e.g. with beams smaller than the entrance pupil of the capillary or with very shallow capillaries, the distribution of the absorbed intensity calculated by ray-tracing corresponds to the local average of the interference pattern found by physical optics.
Temporal bone bank: complying with European Union directives on human tissue and cells.
Van Rompaey, Vincent; Vandamme, Wouter; Muylle, Ludo; Van de Heyning, Paul H
2012-06-01
Availability of allograft tympano-ossicular systems (ATOS) provides unique reconstructive capabilities, allowing more radical removal of middle ear pathology. To provide ATOS, the University of Antwerp Temporal Bone Bank (UATB) was established in 1988. ATOS use was stopped in many countries because of safety issues concerning human tissue transplantation. Our objective was to maintain an ATOS tissue bank complying with European Union (EU) directives on human tissues and cells. The guidelines of the Belgian Superior Health Council, including EU directive requirements, were rigorously applied to UATB infrastructure, workflow protocols and activity. Workflow protocols were updated and an internal audit was performed to check and improve consistency with established quality systems and changing legislations. The Belgian Federal Agency of Medicines and Health Products performed an inspection to examine compliance with national legislatives and EU directives on human tissues and cells. A sample of important procedures was meticulously examined in its workflow setting next to assessment of the infrastructure and personnel. Results are reported on infrastructure, personnel, administrative workflow, procurement, preparation, processing, distribution, internal audit and inspection by the competent authority. Donors procured: 2006, 93 (45.1%); 2007, 64 (20.6%); 2008, 56 (13.1%); 2009, 79 (6.9%). The UATB was approved by the Minister of Health without critical or important shortcomings. The Ministry accords registration each time for 2 years. An ATOS tissue bank complying with EU regulations on human allografts is feasible and critical to assure that the patient receives tissue, which is safe, individually checked and prepared in a suitable environment.
Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows
NASA Astrophysics Data System (ADS)
Ivanov, Mark V.; Levitsky, Lev I.; Gorshkov, Mikhail V.
2016-09-01
A number of proteomic database search engines implement multi-stage strategies aiming at increasing the sensitivity of proteome analysis. These approaches often employ a subset of the original database for the secondary stage of analysis. However, if target-decoy approach (TDA) is used for false discovery rate (FDR) estimation, the multi-stage strategies may violate the underlying assumption of TDA that false matches are distributed uniformly across the target and decoy databases. This violation occurs if the numbers of target and decoy proteins selected for the second search are not equal. Here, we propose a method of decoy database generation based on the previously reported decoy fusion strategy. This method allows unbiased TDA-based FDR estimation in multi-stage searches and can be easily integrated into existing workflows utilizing popular search engines and post-search algorithms.
RISA: Remote Interface for Science Analysis
NASA Astrophysics Data System (ADS)
Gabriel, C.; Ibarra, A.; de La Calle, I.; Salgado, J.; Osuna, P.; Tapiador, D.
2008-08-01
The Scientific Analysis System (SAS) is the package for interactive and pipeline data reduction of all XMM-Newton data. Freely distributed by ESA to run under many different operating systems, the SAS has been used by almost every one of the 1600 refereed scientific publications obtained so far from the mission. We are developing RISA, the Remote Interface for Science Analysis, which makes it possible to run SAS through fully configurable web service workflows, enabling observers to access and analyse data making use of all of the existing SAS functionalities, without any installation/download of software/data. The workflows run primarily but not exclusively on the ESAC Grid, which offers scalable processing resources, directly connected to the XMM-Newton Science Archive. A first project internal version of RISA was issued in May 2007, a public release is expected already within this year.
NASA Astrophysics Data System (ADS)
Clempner, Julio B.
2017-01-01
This paper presents a novel analytical method for soundness verification of workflow nets and reset workflow nets, using the well-known stability results of Lyapunov for Petri nets. We also prove that the soundness property is decidable for workflow nets and reset workflow nets. In addition, we provide evidence of several outcomes related with properties such as boundedness, liveness, reversibility and blocking using stability. Our approach is validated theoretically and by a numerical example related to traffic signal-control synchronisation.
Monitoring data transfer latency in CMS computing operations
Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo; ...
2015-12-23
During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less
Monitoring data transfer latency in CMS computing operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo
During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less
Walsh, Kristin E; Chui, Michelle Anne; Kieser, Mara A; Williams, Staci M; Sutter, Susan L; Sutter, John G
2011-01-01
To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign.
USDA-ARS?s Scientific Manuscript database
Two micrometeorological techniques for measuring trace gas emission rates from distributed area sources were evaluated using a variety of synthetic area sources. The accuracy of the vertical radial plume mapping (VRPM) and the backward Lagrangian (bLS) techniques with an open-path optical spectrosco...
Raef, A.
2009-01-01
The recent proliferation of the 3D reflection seismic method into the near-surface area of geophysical applications, especially in response to the emergence of the need to comprehensively characterize and monitor near-surface carbon dioxide sequestration in shallow saline aquifers around the world, justifies the emphasis on cost-effective and robust quality control and assurance (QC/QA) workflow of 3D seismic data preprocessing that is suitable for near-surface applications. The main purpose of our seismic data preprocessing QC is to enable the use of appropriate header information, data that are free of noise-dominated traces, and/or flawed vertical stacking in subsequent processing steps. In this article, I provide an account of utilizing survey design specifications, noise properties, first breaks, and normal moveout for rapid and thorough graphical QC/QA diagnostics, which are easy to apply and efficient in the diagnosis of inconsistencies. A correlated vibroseis time-lapse 3D-seismic data set from a CO2-flood monitoring survey is used for demonstrating QC diagnostics. An important by-product of the QC workflow is establishing the number of layers for a refraction statics model in a data-driven graphical manner that capitalizes on the spatial coverage of the 3D seismic data. ?? China University of Geosciences (Wuhan) and Springer-Verlag GmbH 2009.
Zhu, Ying; Dou, Maowei; Piehowski, Paul D; Liang, Yiran; Wang, Fangjun; Chu, Rosalie K; Chrisler, Will; Smith, Jordan N; Schwarz, Kaitlynn C; Shen, Yufeng; Shukla, Anil K; Moore, Ronald J; Smith, Richard D; Qian, Wei-Jun; Kelly, Ryan T
2018-06-24
Current mass spectrometry (MS)-based proteomics approaches are ineffective for mapping protein expression in tissue sections with high spatial resolution due to the limited overall sensitivity of conventional workflows. Here we report an integrated and automated method to advance spatially resolved proteomics by seamlessly coupling laser capture microdissection (LCM) with a recently developed nanoliter-scale sample preparation system termed nanoPOTS (Nanodroplet Processing in One pot for Trace Samples). The workflow is enabled by prepopulating nanowells with DMSO, which serves as a sacrificial capture liquid for microdissected tissues. The DMSO droplets efficiently collect laser-pressure catapulted LCM tissues as small as 20 µm in diameter with success rates >87%. We also demonstrate that tissue treatment with DMSO can significantly improve proteome coverage, likely due to its ability to dissolve lipids from tissue and enhance protein extraction efficiency. The LCM-nanoPOTS platform was able to identify 180, 695, and 1827 protein groups on average from 12-µm-thick rat brain cortex tissue sections with diameters of 50, 100, and 200 µm, respectively. We also analyzed 100-µm-diameter sections corresponding to 10-18 cells from three different regions of rat brain and comparatively quantified ~1000 proteins, demonstrating the potential utility for high-resolution spatially resolved mapping of protein expression in tissues. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Generic worklist handler for workflow-enabled products
NASA Astrophysics Data System (ADS)
Schmidt, Joachim; Meetz, Kirsten; Wendler, Thomas
1999-07-01
Workflow management (WfM) is an emerging field of medical information technology. It appears as a promising key technology to model, optimize and automate processes, for the sake of improved efficiency, reduced costs and improved patient care. The Application of WfM concepts requires the standardization of architectures and interfaces. A component of central interest proposed in this report is a generic work list handler: A standardized interface between a workflow enactment service and application system. Application systems with embedded work list handlers will be called 'Workflow Enabled Application Systems'. In this paper we discus functional requirements of work list handlers, as well as their integration into workflow architectures and interfaces. To lay the foundation for this specification, basic workflow terminology, the fundamentals of workflow management and - later in the paper - the available standards as defined by the Workflow Management Coalition are briefly reviewed.
NASA Astrophysics Data System (ADS)
Kumlander, Deniss
The globalization of companies operations and competitor between software vendors demand improving quality of delivered software and decreasing the overall cost. The same in fact introduce a lot of problem into software development process as produce distributed organization breaking the co-location rule of modern software development methodologies. Here we propose a reformulation of the ambassador position increasing its productivity in order to bridge communication and workflow gap by managing the entire communication process rather than concentrating purely on the communication result.
A soil sampling reference site: the challenge in defining reference material for sampling.
de Zorzi, Paolo; Barbizzi, Sabrina; Belli, Maria; Fajgelj, Ales; Jacimovic, Radojko; Jeran, Zvonka; Sansone, Umberto; van der Perk, Marcel
2008-11-01
In the frame of the international SOILSAMP project, funded and coordinated by the Italian Environmental Protection Agency, an agricultural area was established as a reference site suitable for performing soil sampling inter-comparison exercises. The reference site was characterized for trace element content in soil, in terms of the spatial and temporal variability of their mass fraction. Considering that the behaviour of long-lived radionuclides in soil can be expected to be similar to that of some stable trace elements and that the distribution of these trace elements in soil can simulate the distribution of radionuclides, the reference site characterised in term of trace elements, can be also used to compare the soil sampling strategies developed for radionuclide investigations.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Knowledge-based modelling of historical surfaces using lidar data
NASA Astrophysics Data System (ADS)
Höfler, Veit; Wessollek, Christine; Karrasch, Pierre
2016-10-01
Currently in archaeological studies digital elevation models are mainly used especially in terms of shaded reliefs for the prospection of archaeological sites. Hesse (2010) provides a supporting software tool for the determination of local relief models during the prospection using LiDAR scans. Furthermore the search for relicts from WW2 is also in the focus of his research. In James et al. (2006) the determined contour lines were used to reconstruct locations of archaeological artefacts such as buildings. This study is much more and presents an innovative workflow of determining historical high resolution terrain surfaces using recent high resolution terrain models and sedimentological expert knowledge. Based on archaeological field studies (Franconian Saale near Bad Neustadt in Germany) the sedimentological analyses shows that archaeological interesting horizon and geomorphological expert knowledge in combination with particle size analyses (Koehn, DIN ISO 11277) are useful components for reconstructing surfaces of the early Middle Ages. Furthermore the paper traces how it is possible to use additional information (extracted from a recent digital terrain model) to support the process of determination historical surfaces. Conceptual this research is based on methodology of geomorphometry and geo-statistics. The basic idea is that the working procedure is based on the different input data. One aims at tracking the quantitative data and the other aims at processing the qualitative data. Thus, the first quantitative data were available for further processing, which were later processed with the qualitative data to convert them to historical heights. In the final stage of the workflow all gathered information are stored in a large data matrix for spatial interpolation using the geostatistical method of Kriging. Besides the historical surface, the algorithm also provides a first estimation of accuracy of the modelling. The presented workflow is characterized by a high flexibility and the opportunity to include new available data in the process at any time.
Standardizing clinical trials workflow representation in UML for international site comparison.
de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M O; Rodrigues, Maria J; Shah, Jatin; Loures, Marco R; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo
2010-11-09
With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows.
Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison
de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo
2010-01-01
Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows. PMID:21085484
NASA Technical Reports Server (NTRS)
Lynnes, C.
2014-01-01
Federated Giovanni is a NASA-funded ACCESS project to extend the scope of the GES DISC Giovanni online analysis tool to 4 other Distributed Active Archive Centers within EOSDIS: OBPG, LP-DAAC, MODAPS and PO.DAAC. As such, it represents a significant instance of sharing technology across the DAACs. We also touch on several sub-areas that are also sharable, such as Giovanni URLs, workflows and OGC-accessible services.
Ergatis: a web interface and scalable software system for bioinformatics workflows
Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.
2010-01-01
Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634
Ribeiro, C; Couto, C; Ribeiro, A R; Maia, A S; Santos, M; Tiritan, M E; Pinto, E; Almeida, A A
2018-10-15
The present study evaluated the content and distribution of several trace elements (Li, Be, Al, V, Cr, Co, Ni, Cu, Zn, Se, Mo, Ag, Cd, Sb, Ba, Tl, Pb, and U) in the Douro River estuary. For that, three matrices were collected (water, sediments and native local flora) to assess the extent of contamination by these elements in this estuarine ecosystem. Results showed their occurrence in estuarine water and sediments, but significant differences were recorded on the concentration levels and pattern of distribution among both matrices and sampling points. Generally, the levels of trace elements were higher in the sediments than in the respective estuarine water. Nonetheless, no correlation among trace elements was determined between water and sediments, except for Cd. Al was the trace element found at highest concentration at both sediments and water followed by Zn. Pollution indices such as geo-accumulation (I geo ), enrichment factor (EF) and contamination factor (CF) were determined to understand the levels and sources of trace elements pollution. I geo showed strong contamination by anthropogenic activities for Li, Al, V, Cr, Ni, Cu, Zn, Ba and Pb at all sampling points while EF and CF demonstrated severe enrichment and contamination by Se, Sb and Pb. Levels of trace elements were compared to acceptable values for aquatic organisms and Sediment Quality Guidelines. The concentration of some trace elements, namely Al, Pb and Cu, were higher than those considered acceptable, with potential negative impact on local living organisms. Nevertheless, permissible values for all trace elements are still not available, demonstrating that further studies are needed in order to have a complete assessment of environmental risk. Furthermore, the occurrence and possible accumulation of trace elements by local plant species and macroalgae were investigated as well as their potential use as bioindicators of local pollution and for phytoremediation purposes. Copyright © 2018 Elsevier B.V. All rights reserved.
Horizontal and vertical variability of soil properties in a trace element contaminated area
NASA Astrophysics Data System (ADS)
Burgos, Pilar; Madejón, Engracia; Pérez-de-Mora, Alfredo; Cabrera, Francisco
2008-02-01
The spatial distribution of some soil chemical properties and trace element contents of a plot affected by the Aznalcóllar mine spill were investigated using statistical and geostatistical methods to assess the extent of soil contamination. Total and EDTA-extractable soil trace element concentrations and total S content showed great variability and high coefficients of variation in the three examined depths. Soil in the plot was found to be significantly contaminated by As, Cd, Cu, Pb and Zn within a wide range of pH. Total trace element concentrations at all depths (0-60 cm) were much higher than background values of non-affected soil, indicating that despite the clean-up operations, the concentration of trace elements in the experimental plot was still high. The spatial distribution of the different variables was estimated by kriging to design contour maps. These maps allowed the identification of specific zones with high metal concentrations and low pH values corresponding to spots of residual sludge. Moreover, kriged maps showed distinct spatial distribution and hence different behaviour for the elements considered. This information may be applied to optimise remediation strategies in highly and moderately contaminated areas.
NASA Astrophysics Data System (ADS)
Woiwode, Wolfgang; Oelhaf, Hermann; Dörnbrack, Andreas; Bramberger, Martina; Diekmann, Christopher; Friedl-Vallon, Felix; Höpfner, Michael; Hoor, Peter; Johansson, Sören; Krause, Jens; Kunkel, Daniel; Orphal, Johannes; Preusse, Peter; Ruhnke, Roland; Schlage, Romy; Schröter, Jennifer; Sinnhuber, Björn-Martin; Ungermann, Jörn; Zahn, Andreas
2017-04-01
Tropopause folds are known of enabling efficient exchange of trace constituents between the stratosphere and troposphere. In particular, the modification of the vertical distributions of radiatively important H2O and other reactive trace gases associated with tropopause folds is relevant for accurate model simulations of the upper troposphere and lower stratosphere composition. During the POLSTRACC/GW-LCYCLE/SALSA flight on 12 January 2016, the HALO (High Altitude LOng range) aircraft crossed twice an extended tropopause fold in the vicinity of the Arctic polar vortex. At the same time, the ECMWF operational analysis shows that the meteorological scenario probed above Italy was accompanied by wide-spread gravity wave activity induced by north-westerly winds. Using high spectral resolution limb-observations by the GLORIA (Gimballed Limb Observer for Radiance Imaging of the Atmosphere) spectrometer aboard HALO and associated observations, we investigate the vertical distributions of H2O, O3, temperature, and associated parameters across the tropopause fold. In combination with a high-resolution simulation by the ICON-ART (ICOsahedral Nonhydrostatic- Aerosol and Reactive Trace gases) model, we search for indications for irreversible trace gas exchange between the stratosphere and troposphere and the potential influence of gravity waves.
Ion microprobe mass analysis of plagioclase from 'non-mare' lunar samples
NASA Technical Reports Server (NTRS)
Meyer, C., Jr.; Anderson, D. H.; Bradley, J. G.
1974-01-01
The ion microprobe was used to measure the composition and distribution of trace elements in lunar plagioclase, and these analyses are used as criteria in determining the possible origins of some nonmare lunar samples. The Apollo 16 samples with metaclastic texture and high-bulk trace-element contents contain plagioclase clasts with extremely low trace-element contents. These plagioclase inclusions represent unequilibrated relicts of anorthositic, noritic, or troctolitic rocks that have been intermixed as a rock flour into the KREEP-rich matrix of these samples. All of the plagioclase-rich inclusions which were analyzed in the KREEP-rich Apollo 14 breccias were found to be rich in trace elements. This does not seem to be consistent with the interpretation that the Apollo 14 samples represent a pre-Imbrium regolith, because such an ancient regolith should have contained many plagioclase clasts with low trace-element contents more typical of plagioclase from the pre-Imbrium crust. Ion-microprobe analyses for Ba and Sr in large plagioclase phenocrysts in 14310 and 68415 are consistent with the bulk compositions of these rocks and with the known distribution coefficients for these elements. The distribution coefficient for Li (basaltic liquid/plagioclase) was measured to be about 2.
Ross, Joshua V; Black, Andrew J
2015-09-01
Antiviral prophylaxis forms a significant component of health management plans for many countries around the world. A number of studies have shown that the delays typically encountered in distributing these antivirals to households, following the first infectious case, can result in their efficacy being severely reduced. Here, we investigate the use of contact tracing as a method to reduce the delays and hence mitigate the reduction in efficacy of antivirals. We assess the usefulness of contact tracing in terms of the probability of a major outbreak. It is found, with parameter distributions appropriate to the 2009 H1N1 pandemic and distributions reflecting commonly experienced delays, that standard contact tracing renders an outbreak impossible approximately one in five times compared with approximately one in ten times in its absence. A contact-tracing efficiency of 50% would see further improvements with an outbreak being impossible approximately one in four times, and a reduction of the median probability of a major outbreak from 0.41 to below 0.27. © The authors 2014. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
Quantitative workflow based on NN for weighting criteria in landfill suitability mapping
NASA Astrophysics Data System (ADS)
Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Alkhasawneh, Mutasem Sh.; Aziz, Hamidi Abdul
2017-10-01
Our study aims to introduce a new quantitative workflow that integrates neural networks (NNs) and multi criteria decision analysis (MCDA). Existing MCDA workflows reveal a number of drawbacks, because of the reliance on human knowledge in the weighting stage. Thus, new workflow presented to form suitability maps at the regional scale for solid waste planning based on NNs. A feed-forward neural network employed in the workflow. A total of 34 criteria were pre-processed to establish the input dataset for NN modelling. The final learned network used to acquire the weights of the criteria. Accuracies of 95.2% and 93.2% achieved for the training dataset and testing dataset, respectively. The workflow was found to be capable of reducing human interference to generate highly reliable maps. The proposed workflow reveals the applicability of NN in generating landfill suitability maps and the feasibility of integrating them with existing MCDA workflows.
Distributed trace using central performance counter memory
Satterfield, David L; Sexton, James C
2013-10-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Distributed trace using central performance counter memory
Satterfield, David L.; Sexton, James C.
2013-01-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2017-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Scientific Data Management (SDM) Center for Enabling Technologies. Final Report, 2007-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ludascher, Bertram; Altintas, Ilkay
Our contributions to advancing the State of the Art in scientific workflows have focused on the following areas: Workflow development; Generic workflow components and templates; Provenance collection and analysis; and, Workflow reliability and fault tolerance.
Walsh, Kristin E.; Chui, Michelle Anne; Kieser, Mara A.; Williams, Staci M.; Sutter, Susan L.; Sutter, John G.
2012-01-01
Objective To explore community pharmacy technician workflow change after implementation of an automated robotic prescription-filling device. Methods At an independent community pharmacy in rural Mayville, WI, pharmacy technicians were observed before and 3 months after installation of an automated robotic prescription-filling device. The main outcome measures were sequences and timing of technician workflow steps, workflow interruptions, automation surprises, and workarounds. Results Of the 77 and 80 observations made before and 3 months after robot installation, respectively, 17 different workflow sequences were observed before installation and 38 after installation. Average prescription filling time was reduced by 40 seconds per prescription with use of the robot. Workflow interruptions per observation increased from 1.49 to 1.79 (P = 0.11), and workarounds increased from 10% to 36% after robot use. Conclusion Although automated prescription-filling devices can increase efficiency, workflow interruptions and workarounds may negate that efficiency. Assessing changes in workflow and sequencing of tasks that may result from the use of automation can help uncover opportunities for workflow policy and procedure redesign. PMID:21896459
DNAseq Workflow in a Diagnostic Context and an Example of a User Friendly Implementation.
Wolf, Beat; Kuonen, Pierre; Dandekar, Thomas; Atlan, David
2015-01-01
Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.
NASA Astrophysics Data System (ADS)
McCarthy, Ann
2006-01-01
The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teplyakova, Ludmila, E-mail: lat168@mail.ru; Koneva, Nina, E-mail: koneva@mail.ru; Kunitsyna, Tatyana, E-mail: kma11061990@mail.ru
2016-01-15
The slip trace pattern of Ni{sub 3}Fe alloy single crystals with the short range order oriented for a single slip were investigated on replica at different stages of deformation using the transmission diffraction electron microscopy method. The connection of staging with the formation of slip trace pattern and the change of its parameters were established. The number of local areas where two or more slip systems work is increased with the change of stages. In these conditions the character of slip localization in the primary slip system is changed from the packets to the homogeneous distribution. The distributions of themore » distances between slip traces and the shear power in slip traces were plotted. The correlation between the average value of the shear power in the primary slip traces and the average distance between them was revealed in this work. It was established that the rates of the average value growth of the relative local shear and the shear power in the slip traces reach the largest values at the transition stage.« less
Radiology information system: a workflow-based approach.
Zhang, Jinyan; Lu, Xudong; Nie, Hongchao; Huang, Zhengxing; van der Aalst, W M P
2009-09-01
Introducing workflow management technology in healthcare seems to be prospective in dealing with the problem that the current healthcare Information Systems cannot provide sufficient support for the process management, although several challenges still exist. The purpose of this paper is to study the method of developing workflow-based information system in radiology department as a use case. First, a workflow model of typical radiology process was established. Second, based on the model, the system could be designed and implemented as a group of loosely coupled components. Each component corresponded to one task in the process and could be assembled by the workflow management system. The legacy systems could be taken as special components, which also corresponded to the tasks and were integrated through transferring non-work- flow-aware interfaces to the standard ones. Finally, a workflow dashboard was designed and implemented to provide an integral view of radiology processes. The workflow-based Radiology Information System was deployed in the radiology department of Zhejiang Chinese Medicine Hospital in China. The results showed that it could be adjusted flexibly in response to the needs of changing process, and enhance the process management in the department. It can also provide a more workflow-aware integration method, comparing with other methods such as IHE-based ones. The workflow-based approach is a new method of developing radiology information system with more flexibility, more functionalities of process management and more workflow-aware integration. The work of this paper is an initial endeavor for introducing workflow management technology in healthcare.
UBioLab: a web-LABoratory for Ubiquitous in-silico experiments.
Bartocci, E; Di Berardini, M R; Merelli, E; Vito, L
2012-03-01
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists -for what concerns their management and visualization- and for bioinformaticians -for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle -and possibly to handle in a transparent and uniform way- aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features -as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques- give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
Landfors, Mattias; Philip, Philge; Rydén, Patrik; Stenberg, Per
2011-01-01
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods. PMID:22132175
Cornett, Alex; Kuziemsky, Craig
2015-01-01
Implementing team based workflows can be complex because of the scope of providers involved and the extent of information exchange and communication that needs to occur. While a workflow may represent the ideal structure of communication that needs to occur, information issues and contextual factors may impact how the workflow is implemented in practice. Understanding these issues will help us better design systems to support team based workflows. In this paper we use a case study of palliative sedation therapy (PST) to model a PST workflow and then use it to identify purposes of communication, information issues and contextual factors that impact them. We then suggest how our findings could inform health information technology (HIT) design to support team based communication workflows.
An Automated Design Framework for Multicellular Recombinase Logic.
Guiziou, Sarah; Ulliana, Federico; Moreau, Violaine; Leclere, Michel; Bonnet, Jerome
2018-05-18
Tools to systematically reprogram cellular behavior are crucial to address pressing challenges in manufacturing, environment, or healthcare. Recombinases can very efficiently encode Boolean and history-dependent logic in many species, yet current designs are performed on a case-by-case basis, limiting their scalability and requiring time-consuming optimization. Here we present an automated workflow for designing recombinase logic devices executing Boolean functions. Our theoretical framework uses a reduced library of computational devices distributed into different cellular subpopulations, which are then composed in various manners to implement all desired logic functions at the multicellular level. Our design platform called CALIN (Composable Asynchronous Logic using Integrase Networks) is broadly accessible via a web server, taking truth tables as inputs and providing corresponding DNA designs and sequences as outputs (available at http://synbio.cbs.cnrs.fr/calin ). We anticipate that this automated design workflow will streamline the implementation of Boolean functions in many organisms and for various applications.
Brown, David M. L.; Cho, Herman; de Jong, Wibe A.
2016-02-09
Here, the testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashionmore » would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.« less
IceProd 2: A Next Generation Data Analysis Framework for the IceCube Neutrino Observatory
NASA Astrophysics Data System (ADS)
Schultz, D.
2015-12-01
We describe the overall structure and new features of the second generation of IceProd, a data processing and management framework. IceProd was developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and analysis levels. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of resources. IceProd is designed to be very light-weight; it runs as a python application fully in user space and can be set up easily. For the initial completion of this second version of IceProd, improvements have been made to increase security, reliability, scalability, and ease of use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, David M. L.; Cho, Herman; de Jong, Wibe A.
Here, the testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashionmore » would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.« less
A Workflow to Improve the Alignment of Prostate Imaging with Whole-mount Histopathology.
Yamamoto, Hidekazu; Nir, Dror; Vyas, Lona; Chang, Richard T; Popert, Rick; Cahill, Declan; Challacombe, Ben; Dasgupta, Prokar; Chandra, Ashish
2014-08-01
Evaluation of prostate imaging tests against whole-mount histology specimens requires accurate alignment between radiologic and histologic data sets. Misalignment results in false-positive and -negative zones as assessed by imaging. We describe a workflow for three-dimensional alignment of prostate imaging data against whole-mount prostatectomy reference specimens and assess its performance against a standard workflow. Ethical approval was granted. Patients underwent motorized transrectal ultrasound (Prostate Histoscanning) to generate a three-dimensional image of the prostate before radical prostatectomy. The test workflow incorporated steps for axial alignment between imaging and histology, size adjustments following formalin fixation, and use of custom-made parallel cutters and digital caliper instruments. The control workflow comprised freehand cutting and assumed homogeneous block thicknesses at the same relative angles between pathology and imaging sections. Thirty radical prostatectomy specimens were histologically and radiologically processed, either by an alignment-optimized workflow (n = 20) or a control workflow (n = 10). The optimized workflow generated tissue blocks of heterogeneous thicknesses but with no significant drifting in the cutting plane. The control workflow resulted in significantly nonparallel blocks, accurately matching only one out of four histology blocks to their respective imaging data. The image-to-histology alignment accuracy was 20% greater in the optimized workflow (P < .0001), with higher sensitivity (85% vs. 69%) and specificity (94% vs. 73%) for margin prediction in a 5 × 5-mm grid analysis. A significantly better alignment was observed in the optimized workflow. Evaluation of prostate imaging biomarkers using whole-mount histology references should include a test-to-reference spatial alignment workflow. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.
Conceptual-level workflow modeling of scientific experiments using NMR as a case study
Verdi, Kacy K; Ellis, Heidi JC; Gryk, Michael R
2007-01-01
Background Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. Results We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Conclusion Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment. PMID:17263870
FAST: A fully asynchronous and status-tracking pattern for geoprocessing services orchestration
NASA Astrophysics Data System (ADS)
Wu, Huayi; You, Lan; Gui, Zhipeng; Gao, Shuang; Li, Zhenqiang; Yu, Jingmin
2014-09-01
Geoprocessing service orchestration (GSO) provides a unified and flexible way to implement cross-application, long-lived, and multi-step geoprocessing service workflows by coordinating geoprocessing services collaboratively. Usually, geoprocessing services and geoprocessing service workflows are data and/or computing intensive. The intensity feature may make the execution process of a workflow time-consuming. Since it initials an execution request without blocking other interactions on the client side, an asynchronous mechanism is especially appropriate for GSO workflows. Many critical problems remain to be solved in existing asynchronous patterns for GSO including difficulties in improving performance, status tracking, and clarifying the workflow structure. These problems are a challenge when orchestrating performance efficiency, making statuses instantly available, and constructing clearly structured GSO workflows. A Fully Asynchronous and Status-Tracking (FAST) pattern that adopts asynchronous interactions throughout the whole communication tier of a workflow is proposed for GSO. The proposed FAST pattern includes a mechanism that actively pushes the latest status to clients instantly and economically. An independent proxy was designed to isolate the status tracking logic from the geoprocessing business logic, which assists the formation of a clear GSO workflow structure. A workflow was implemented in the FAST pattern to simulate the flooding process in the Poyang Lake region. Experimental results show that the proposed FAST pattern can efficiently tackle data/computing intensive geoprocessing tasks. The performance of all collaborative partners was improved due to the asynchronous mechanism throughout communication tier. A status-tracking mechanism helps users retrieve the latest running status of a GSO workflow in an efficient and instant way. The clear structure of the GSO workflow lowers the barriers for geospatial domain experts and model designers to compose asynchronous GSO workflows. Most importantly, it provides better support for locating and diagnosing potential exceptions.
Conceptual-level workflow modeling of scientific experiments using NMR as a case study.
Verdi, Kacy K; Ellis, Heidi Jc; Gryk, Michael R
2007-01-30
Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration. We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy. Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
Scientific Workflows and the Sensor Web for Virtual Environmental Observatories
NASA Astrophysics Data System (ADS)
Simonis, I.; Vahed, A.
2008-12-01
Virtual observatories mature from their original domain and become common practice for earth observation research and policy building. The term Virtual Observatory originally came from the astronomical research community. Here, virtual observatories provide universal access to the available astronomical data archives of space and ground-based observatories. Further on, as those virtual observatories aim at integrating heterogeneous ressources provided by a number of participating organizations, the virtual observatory acts as a coordinating entity that strives for common data analysis techniques and tools based on common standards. The Sensor Web is on its way to become one of the major virtual observatories outside of the astronomical research community. Like the original observatory that consists of a number of telescopes, each observing a specific part of the wave spectrum and with a collection of astronomical instruments, the Sensor Web provides a multi-eyes perspective on the current, past, as well as future situation of our planet and its surrounding spheres. The current view of the Sensor Web is that of a single worldwide collaborative, coherent, consistent and consolidated sensor data collection, fusion and distribution system. The Sensor Web can perform as an extensive monitoring and sensing system that provides timely, comprehensive, continuous and multi-mode observations. This technology is key to monitoring and understanding our natural environment, including key areas such as climate change, biodiversity, or natural disasters on local, regional, and global scales. The Sensor Web concept has been well established with ongoing global research and deployment of Sensor Web middleware and standards and represents the foundation layer of systems like the Global Earth Observation System of Systems (GEOSS). The Sensor Web consists of a huge variety of physical and virtual sensors as well as observational data, made available on the Internet at standardized interfaces. All data sets and sensor communication follow well-defined abstract models and corresponding encodings, mostly developed by the OGC Sensor Web Enablement initiative. Scientific progress is currently accelerated by an emerging new concept called scientific workflows, which organize and manage complex distributed computations. A scientific workflow represents and records the highly complex processes that a domain scientist typically would follow in exploration, discovery and ultimately, transformation of raw data to publishable results. The challenge is now to integrate the benefits of scientific workflows with those provided by the Sensor Web in order to leverage all resources for scientific exploration, problem solving, and knowledge generation. Scientific workflows for the Sensor Web represent the next evolutionary step towards efficient, powerful, and flexible earth observation frameworks and platforms. Those platforms support the entire process from capturing data, sharing and integrating, to requesting additional observations. Multiple sites and organizations will participate on single platforms and scientists from different countries and organizations interact and contribute to large-scale research projects. Simultaneously, the data- and information overload becomes manageable, as multiple layers of abstraction will free scientists to deal with underlying data-, processing or storage peculiarities. The vision are automated investigation and discovery mechanisms that allow scientists to pose queries to the system, which in turn would identify potentially related resources, schedules processing tasks and assembles all parts in workflows that may satisfy the query.
Earth Science Mining Web Services
NASA Astrophysics Data System (ADS)
Pham, L. B.; Lynnes, C. S.; Hegde, M.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.
2008-12-01
To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have developed a system to mine data at the source without the need of network transfers. The system has been constructed by linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements (S4PM), a processing engine at the GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow composer; and the EOS Clearinghouse (ECHO). XBaya is used to construct an analysis workflow at UAH using ADaM components, which are also installed remotely at the GES DISC, wrapped as Web Services. The S4PM processing engine searches ECHO for data using space-time criteria, staging them to cache, allowing the ActiveBPEL engine to remotely orchestrates the processing workflow within S4PM. As mining is completed, the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, while mining data at the data source using the server's resources rather than transferring the full volume over the internet. These diverse technologies have been infused into a functioning, distributed system with only minor changes to the underlying technologies. The key to this infusion is the loosely coupled, Web- Services based architecture: All of the participating components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services.
Earth Science Mining Web Services
NASA Technical Reports Server (NTRS)
Pham, Long; Lynnes, Christopher; Hegde, Mahabaleshwa; Graves, Sara; Ramachandran, Rahul; Maskey, Manil; Keiser, Ken
2008-01-01
To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have developed a system to mine data at the source without the need of network transfers. The system has been constructed by linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements (S4PM), a processing engine at he GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow composer; and the EOS Clearinghouse (ECHO). XBaya is used to construct an analysis workflow at UAH using ADam components, which are also installed remotely at the GES DISC, wrapped as Web Services. The S4PM processing engine searches ECHO for data using space-time criteria, staging them to cache, allowing the ActiveBPEL engine to remotely orchestras the processing workflow within S4PM. As mining is completed, the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, while mining data at the data source using the server's resources rather than transferring the full volume over the internet. These diverse technologies have been infused into a functioning, distributed system with only minor changes to the underlying technologies. The key to the infusion is the loosely coupled, Web-Services based architecture: All of the participating components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services.
NASA Astrophysics Data System (ADS)
Bastrakova, I.; Car, N.
2017-12-01
Geoscience Australia (GA) is recognised and respected as the National Repository and steward of multiple nationally significance data collections that provides geoscience information, services and capability to the Australian Government, industry and stakeholders. Internally, this brings a challenge of managing large volume (11 PB) of diverse and highly complex data distributed through a significant number of catalogues, applications, portals, virtual laboratories, and direct downloads from multiple locations. Externally, GA is facing constant changer in the Government regulations (e.g. open data and archival laws), growing stakeholder demands for high quality and near real-time delivery of data and products, and rapid technological advances enabling dynamic data access. Traditional approach to citing static data and products cannot satisfy increasing demands for the results from scientific workflows, or items within the workflows to be open, discoverable, thrusted and reproducible. Thus, citation of data, products, codes and applications through the implementation of provenance records is being implemented. This approach involves capturing the provenance of many GA processes according to a standardised data model and storing it, as well as metadata for the elements it references, in a searchable set of systems. This provides GA with ability to cite workflows unambiguously as well as each item within each workflow, including inputs and outputs and many other registered components. Dynamic objects can therefore be referenced flexibly in relation to their generation process - a dataset's metadata indicates where to obtain its provenance from - meaning the relevant facts of its dynamism need not be crammed into a single citation object with a single set of attributes. This allows for simple citations, similar to traditional static document citations such as references in journals, to be used for complex dynamic data and other objects such as software code.
Mix or un-mix? Trace element segregation from a heterogeneous mantle, simulated.
NASA Astrophysics Data System (ADS)
Katz, R. F.; Keller, T.; Warren, J. M.; Manley, G.
2016-12-01
Incompatible trace-element concentrations vary in mid-ocean ridge lavas and melt inclusions by an order of magnitude or more, even in samples from the same location. This variability has been attributed to channelised melt flow [Spiegelman & Kelemen, 2003], which brings enriched, low-degree melts to the surface in relative isolation from depleted inter-channel melts. We re-examine this hypothesis using a new melting-column model that incorporates mantle volatiles [Keller & Katz 2016]. Volatiles cause a deeper onset of channelisation: their corrosivity is maximum at the base of the silicate melting regime. We consider how source heterogeneity and melt transport shape trace-element concentrations in basaltic lavas. We use both equilibrium and non-equilibrium formulations [Spiegelman 1996]. In particular, we evaluate the effect of melt transport on probability distributions of trace element concentration, comparing the inflow distribution in the mantle with the outflow distribution in the magma. Which features of melt transport preserve, erase or overprint input correlations between elements? To address this we consider various hypotheses about mantle heterogeneity, allowing for spatial structure in major components, volatiles and trace elements. Of interest are the roles of wavelength, amplitude, and correlation of heterogeneity fields. To investigate how different modes of melt transport affect input distributions, we compare melting models that produce either shallow or deep channelisation, or none at all.References:Keller & Katz (2016). The Role of Volatiles in Reactive Melt Transport in the Asthenosphere. Journal of Petrology, http://doi.org/10.1093/petrology/egw030. Spiegelman (1996). Geochemical consequences of melt transport in 2-D: The sensitivity of trace elements to mantle dynamics. Earth and Planetary Science Letters, 139, 115-132. Spiegelman & Kelemen (2003). Extreme chemical variability as a consequence of channelized melt transport. Geochemistry Geophysics Geosystems, http://doi.org/10.1029/2002GC000336
Distribution and speciation of trace elements in iron and manganese oxide cave deposits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frierdich, Andrew J.; Catalano, Jeffrey G.
2012-10-24
Fe and Mn oxide minerals control the distribution and speciation of heavy metals and trace elements in soils and aquatic systems through chemical mechanisms involving adsorption, incorporation, and electron transfer. The Pautler Cave System in Southwest Illinois, an analog to other temperate carbonate-hosted karst systems, contains Fe and Mn oxide minerals that form in multiple depositional environments and have high concentrations of associated trace elements. Synchrotron-based micro-scanning X-ray fluorescence ({mu}-SXRF) shows unique spatial distributions of Fe, Mn, and trace elements in mineral samples. Profile maps of Mn oxide cave stream pebble coatings show Fe- and As-rich laminations, indicating dynamic redoxmore » conditions in the cave stream. {mu}-SXRF maps demonstrate that Ni, Cu, and Zn correlate primarily with Mn whereas As correlates with both Mn and Fe; As is more enriched in the Fe phase. Zn is concentrated in the periphery of Mn oxide stream pebble coatings, and may be an indication of recent anthropogenic surface activity. X-ray absorption fine structure spectroscopy measurements reveal that As(V) occurs as surface complexes on Mn and Fe oxides whereas Zn(II) associated with Mn oxides is adsorbed to the basal planes of phyllomanganates in a tetrahedral coordination. Co(III) and Se(IV) are also observed to be associated with Mn oxides. The observation of Fe, Mn, and trace element banding in Mn oxide cave stream pebble coatings suggests that these materials are sensitive to and document aqueous redox conditions, similar to ferromanganese nodules in soils and in marine and freshwater sediments. Furthermore, speciation and distribution measurements indicate that these minerals scavenge trace elements and limit the transport of micronutrients and contaminants in karst aquifer systems while also potentially recording changes in anthropogenic surface activity and land-use.« less
Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz
2016-01-01
As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optimizations1 to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor a , an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions. PMID:27896971
Kaushik, Gaurav; Ivkovic, Sinisa; Simonovic, Janko; Tijanic, Nebojsa; Davis-Dusenbery, Brandi; Kural, Deniz
2017-01-01
As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.
The future of scientific workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
NASA Astrophysics Data System (ADS)
Wang, Ximing; Martinez, Clarisa; Wang, Jing; Liu, Ye; Liu, Brent
2014-03-01
Clinical trials usually have a demand to collect, track and analyze multimedia data according to the workflow. Currently, the clinical trial data management requirements are normally addressed with custom-built systems. Challenges occur in the workflow design within different trials. The traditional pre-defined custom-built system is usually limited to a specific clinical trial and normally requires time-consuming and resource-intensive software development. To provide a solution, we present a user customizable imaging informatics-based intelligent workflow engine system for managing stroke rehabilitation clinical trials with intelligent workflow. The intelligent workflow engine provides flexibility in building and tailoring the workflow in various stages of clinical trials. By providing a solution to tailor and automate the workflow, the system will save time and reduce errors for clinical trials. Although our system is designed for clinical trials for rehabilitation, it may be extended to other imaging based clinical trials as well.
The standard-based open workflow system in GeoBrain (Invited)
NASA Astrophysics Data System (ADS)
Di, L.; Yu, G.; Zhao, P.; Deng, M.
2013-12-01
GeoBrain is an Earth science Web-service system developed and operated by the Center for Spatial Information Science and Systems, George Mason University. In GeoBrain, a standard-based open workflow system has been implemented to accommodate the automated processing of geospatial data through a set of complex geo-processing functions for advanced production generation. The GeoBrain models the complex geoprocessing at two levels, the conceptual and concrete. At the conceptual level, the workflows exist in the form of data and service types defined by ontologies. The workflows at conceptual level are called geo-processing models and cataloged in GeoBrain as virtual product types. A conceptual workflow is instantiated into a concrete, executable workflow when a user requests a product that matches a virtual product type. Both conceptual and concrete workflows are encoded in Business Process Execution Language (BPEL). A BPEL workflow engine, called BPELPower, has been implemented to execute the workflow for the product generation. A provenance capturing service has been implemented to generate the ISO 19115-compliant complete product provenance metadata before and after the workflow execution. The generation of provenance metadata before the workflow execution allows users to examine the usability of the final product before the lengthy and expensive execution takes place. The three modes of workflow executions defined in the ISO 19119, transparent, translucent, and opaque, are available in GeoBrain. A geoprocessing modeling portal has been developed to allow domain experts to develop geoprocessing models at the type level with the support of both data and service/processing ontologies. The geoprocessing models capture the knowledge of the domain experts and are become the operational offering of the products after a proper peer review of models is conducted. An automated workflow composition has been experimented successfully based on ontologies and artificial intelligence technology. The GeoBrain workflow system has been used in multiple Earth science applications, including the monitoring of global agricultural drought, the assessment of flood damage, the derivation of national crop condition and progress information, and the detection of nuclear proliferation facilities and events.
A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization
Malawski, Maciej; Figiela, Kamil; Bubak, Marian; ...
2015-01-01
This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize themore » cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.« less
Metaworkflows and Workflow Interoperability for Heliophysics
NASA Astrophysics Data System (ADS)
Pierantoni, Gabriele; Carley, Eoin P.
2014-06-01
Heliophysics is a relatively new branch of physics that investigates the relationship between the Sun and the other bodies of the solar system. To investigate such relationships, heliophysicists can rely on various tools developed by the community. Some of these tools are on-line catalogues that list events (such as Coronal Mass Ejections, CMEs) and their characteristics as they were observed on the surface of the Sun or on the other bodies of the Solar System. Other tools offer on-line data analysis and access to images and data catalogues. During their research, heliophysicists often perform investigations that need to coordinate several of these services and to repeat these complex operations until the phenomena under investigation are fully analyzed. Heliophysicists combine the results of these services; this service orchestration is best suited for workflows. This approach has been investigated in the HELIO project. The HELIO project developed an infrastructure for a Virtual Observatory for Heliophysics and implemented service orchestration using TAVERNA workflows. HELIO developed a set of workflows that proved to be useful but lacked flexibility and re-usability. The TAVERNA workflows also needed to be executed directly in TAVERNA workbench, and this forced all users to learn how to use the workbench. Within the SCI-BUS and ER-FLOW projects, we have started an effort to re-think and re-design the heliophysics workflows with the aim of fostering re-usability and ease of use. We base our approach on two key concepts, that of meta-workflows and that of workflow interoperability. We have divided the produced workflows in three different layers. The first layer is Basic Workflows, developed both in the TAVERNA and WS-PGRADE languages. They are building blocks that users compose to address their scientific challenges. They implement well-defined Use Cases that usually involve only one service. The second layer is Science Workflows usually developed in TAVERNA. They- implement Science Cases (the definition of a scientific challenge) by composing different Basic Workflows. The third and last layer,Iterative Science Workflows, is developed in WSPGRADE. It executes sub-workflows (either Basic or Science Workflows) as parameter sweep jobs to investigate Science Cases on large multiple data sets. So far, this approach has proven fruitful for three Science Cases of which one has been completed and two are still being tested.
Globus | Informatics Technology for Cancer Research (ITCR)
Globus software services provide secure cancer research data transfer, synchronization, and sharing in distributed environments at large scale. These services can be integrated into applications and research data gateways, leveraging Globus identity management, single sign-on, search, and authorization capabilities. Globus Genomics integrates Globus with the Galaxy genomics workflow engine and Amazon Web Services to enable cancer genomics analysis that can elastically scale compute resources with demand.
Ho, Jonhan; Aridor, Orly; Glinski, David W.; Saylor, Christopher D.; Pelletier, Joseph P.; Selby, Dale M.; Davis, Steven W.; Lancia, Nicholas; Gerlach, Christopher B.; Newberry, Jonathan; Anthony, Leslie; Pantanowitz, Liron; Parwani, Anil V.
2013-01-01
Background: Advances in digital pathology are accelerating integration of this technology into anatomic pathology (AP). To optimize implementation and adoption of digital pathology systems within a large healthcare organization, initial assessment of both end user (pathologist) needs and organizational infrastructure are required. Contextual inquiry is a qualitative, user-centered tool for collecting, interpreting, and aggregating such detailed data about work practices that can be employed to help identify specific needs and requirements. Aim: Using contextual inquiry, the objective of this study was to identify the unique work practices and requirements in AP for the United States (US) Air Force Medical Service (AFMS) that had to be targeted in order to support their transition to digital pathology. Subjects and Methods: A pathology-centered observer team conducted 1.5 h interviews with a total of 24 AFMS pathologists and histology lab personnel at three large regional centers and one smaller peripheral AFMS pathology center using contextual inquiry guidelines. Findings were documented as notes and arranged into a hierarchal organization of common themes based on user-provided data, defined as an affinity diagram. These data were also organized into consolidated graphic models that characterized AFMS pathology work practices, structure, and requirements. Results: Over 1,200 recorded notes were grouped into an affinity diagram composed of 27 third-level, 10 second-level, and five main-level (workflow and workload distribution, quality, communication, military culture, and technology) categories. When combined with workflow and cultural models, the findings revealed that AFMS pathologists had needs that were unique to their military setting, when compared to civilian pathologists. These unique needs included having to serve a globally distributed patient population, transient staff, but a uniform information technology (IT) structure. Conclusions: The contextual inquiry method helped reveal similarities and key differences with civilian pathologists. Such an analysis helped identify specific instances that would benefit from implementing digital pathology in a military environment. Employing digital pathology to facilitate workload distribution, secondary consultations, and quality assurance (over-reads) could help the AFMS deliver more accurate, efficient, and timely AP services at a global level. PMID:24392246
Wu, Danny T Y; Smart, Nikolas; Ciemins, Elizabeth L; Lanham, Holly J; Lindberg, Curt; Zheng, Kai
2017-01-01
To develop a workflow-supported clinical documentation system, it is a critical first step to understand clinical workflow. While Time and Motion studies has been regarded as the gold standard of workflow analysis, this method can be resource consuming and its data may be biased due to the cognitive limitation of human observers. In this study, we aimed to evaluate the feasibility and validity of using EHR audit trail logs to analyze clinical workflow. Specifically, we compared three known workflow changes from our previous study with the corresponding EHR audit trail logs of the study participants. The results showed that EHR audit trail logs can be a valid source for clinical workflow analysis, and can provide an objective view of clinicians' behaviors, multi-dimensional comparisons, and a highly extensible analysis framework.
Schweitzer, M; Lasierra, N; Hoerbst, A
2015-01-01
Increasing the flexibility from a user-perspective and enabling a workflow based interaction, facilitates an easy user-friendly utilization of EHRs for healthcare professionals' daily work. To offer such versatile EHR-functionality, our approach is based on the execution of clinical workflows by means of a composition of semantic web-services. The backbone of such architecture is an ontology which enables to represent clinical workflows and facilitates the selection of suitable services. In this paper we present the methods and results after running observations of diabetes routine consultations which were conducted in order to identify those workflows and the relation among the included tasks. Mentioned workflows were first modeled by BPMN and then generalized. As a following step in our study, interviews will be conducted with clinical personnel to validate modeled workflows.
MolabIS--an integrated information system for storing and managing molecular genetics data.
Truong, Cong V C; Groeneveld, Linn F; Morgenstern, Burkhard; Groeneveld, Eildert
2011-10-31
Long-term sample storage, tracing of data flow and data export for subsequent analyses are of great importance in genetics studies. Therefore, molecular labs do need a proper information system to handle an increasing amount of data from different projects. We have developed a molecular labs information management system (MolabIS). It was implemented as a web-based system allowing the users to capture original data at each step of their workflow. MolabIS provides essential functionality for managing information on individuals, tracking samples and storage locations, capturing raw files, importing final data from external files, searching results, accessing and modifying data. Further important features are options to generate ready-to-print reports and convert sequence and microsatellite data into various data formats, which can be used as input files in subsequent analyses. Moreover, MolabIS also provides a tool for data migration. MolabIS is designed for small-to-medium sized labs conducting Sanger sequencing and microsatellite genotyping to store and efficiently handle a relative large amount of data. MolabIS not only helps to avoid time consuming tasks but also ensures the availability of data for further analyses. The software is packaged as a virtual appliance which can run on different platforms (e.g. Linux, Windows). MolabIS can be distributed to a wide range of molecular genetics labs since it was developed according to a general data model. Released under GPL, MolabIS is freely available at http://www.molabis.org.
MolabIS - An integrated information system for storing and managing molecular genetics data
2011-01-01
Background Long-term sample storage, tracing of data flow and data export for subsequent analyses are of great importance in genetics studies. Therefore, molecular labs do need a proper information system to handle an increasing amount of data from different projects. Results We have developed a molecular labs information management system (MolabIS). It was implemented as a web-based system allowing the users to capture original data at each step of their workflow. MolabIS provides essential functionality for managing information on individuals, tracking samples and storage locations, capturing raw files, importing final data from external files, searching results, accessing and modifying data. Further important features are options to generate ready-to-print reports and convert sequence and microsatellite data into various data formats, which can be used as input files in subsequent analyses. Moreover, MolabIS also provides a tool for data migration. Conclusions MolabIS is designed for small-to-medium sized labs conducting Sanger sequencing and microsatellite genotyping to store and efficiently handle a relative large amount of data. MolabIS not only helps to avoid time consuming tasks but also ensures the availability of data for further analyses. The software is packaged as a virtual appliance which can run on different platforms (e.g. Linux, Windows). MolabIS can be distributed to a wide range of molecular genetics labs since it was developed according to a general data model. Released under GPL, MolabIS is freely available at http://www.molabis.org. PMID:22040322
Shao, Shuai; Hu, Bifeng; Fu, Zhiyi; Wang, Jiayu; Lou, Ge; Zhou, Yue; Jin, Bin; Li, Yan; Shi, Zhou
2018-06-12
Trace elements pollution has attracted a lot of attention worldwide. However, it is difficult to identify and apportion the sources of multiple element pollutants over large areas because of the considerable spatial complexity and variability in the distribution of trace elements in soil. In this study, we collected total of 2051 topsoil (0⁻20 cm) samples, and analyzed the general pollution status of soils from the Yangtze River Delta, Southeast China. We applied principal component analysis (PCA), a finite mixture distribution model (FMDM), and geostatistical tools to identify and quantitatively apportion the sources of seven kinds of trace elements (chromium (Cr), cadmium (Cd), mercury (Hg), copper (Cu), zinc (Zn), nickel (Ni), and arsenic (As)) in soil. The PCA results indicated that the trace elements in soil in the study area were mainly from natural, multi-pollutant and industrial sources. The FMDM also fitted three sub log-normal distributions. The results from the two models were quite similar: Cr, As, and Ni were mainly from natural sources caused by parent material weathering; Cd, Cu, and Zu were mainly from mixed sources, with a considerable portion from anthropogenic activities such as traffic pollutants, domestic garbage, and agricultural inputs, and Hg was mainly from industrial wastes and pollutants.
Rates of zinc and trace metal release from dissolving sphalerite at pH 2.0-4.0
Stanton, M.R.; Gemery-Hill, P. A.; Shanks, Wayne C.; Taylor, C.D.
2008-01-01
High-Fe and low-Fe sphalerite samples were reacted under controlled pH conditions to determine nonoxidative rates of release of Zn and trace metals from the solid-phase. The release (solubilization) of trace metals from dissolving sphalerite to the aqueous phase can be characterized by a kinetic distribution coefficient, (Dtr), which is defined as [(Rtr/X(tr)Sph)/(RZn/X(Zn) Sph)], where R is the trace metal or Zn release rate, and X is the mole fraction of the trace metal or Zn in sphalerite. This coefficient describes the relationship of the sphalerite dissolution rate to the trace metal mole fraction in the solid and its aqueous concentration. The distribution was used to determine some controls on metal release during the dissolution of sphalerite. Departures from the ideal Dtr of 1.0 suggest that some trace metals may be released via different pathways or that other processes (e.g., adsorption, solubility of trace minerals such as galena) affect the observed concentration of metals. Nonoxidative sphalerite dissolution (mediated by H+) is characterized by a "fast" stage in the first 24-30 h, followed by a "slow" stage for the remainder of the reaction. Over the pH range 2.0-4.0, and for similar extent of reaction (reaction time), sphalerite composition, and surface area, the rates of release of Zn, Fe, Cd, Cu, Mn and Pb from sphalerite generally increase with lower pH. Zinc and Fe exhibit the fastest rates of release, Mn and Pb have intermediate rates of release, and Cd and Cu show the slowest rates of release. The largest variations in metal release rates occur at pH 2.0. At pH 3.0 and 4.0, release rates show less variation and appear less dependent on the metal abundance in the solid. For the same extent of reaction (100 h), rates of Zn release range from 1.53 ?? 10-11 to 5.72 ?? 10-10 mol/m2/s; for Fe, the range is from 4.59 ?? 10-13 to 1.99 ?? 10-10 mol/m2/s. Trace metal release rates are generally 1-5 orders of magnitude slower than the Zn or Fe rates. Results indicate that the distributions of Fe and Cd are directly related to the rate of sphalerite dissolution throughout the reaction at pH 3.0 and 4.0 because these two elements substitute readily into sphalerite. These two metals are likely to be more amenable to usage in predictive acid dissolution models because of this behavior. The Pb distribution shows no strong relation to sphalerite dissolution and appears to be controlled by pH-dependent solubility, most likely related to trace amounts of galena. The distribution of Cu is similar to that of Fe but is the most-dependent of all metals on its mole fraction ratio (Zn:Cu) in sphalerite. The Mn distributions suggest an increase in the rate of Mn release relative to sphalerite dissolution occurs in low Mn samples as pH increases. The Mn distribution in high Mn samples is nearly independent of pH and sphalerite dissolution at pH 2.0 but shows a dependence on these two parameters at higher pH (3.0-4.0).
Distribution of Cr, Pb, Cd, Zn, Fe and Mn in Lake Victoria sediments, East Africa
DOE Office of Scientific and Technical Information (OSTI.GOV)
Onyari, J.M.; Wandiga, S.O.
1989-06-01
The presence of many metals at trace or ultra-trace levels in the human environment has received increased global attention. Sediments as a sink for pollutants are widely recognized pollution sources and diagenesis and biochemical transformations within the sediment may mobilize pollutants posing a threat to a wider biological community. The natural (background) concentrations of heavy metals in lake sediments can be estimated either by analysis of surface sediments in non-polluted regions or by analysis of core samples antedating modern pollution. The distribution pattern of heavy metals in tropical freshwater systems has been little studied. The authors found increased concentrations ofmore » lead and other trace metals in Lake Victoria. Thus this study was initiated in order to further investigate the distribution patterns of lead and other metals in Lake Victoria.« less
Tracing Success: Graphical Methods for Analysing Successful Collaborative Problem Solving
ERIC Educational Resources Information Center
Joiner, Richard; Issroff, Kim
2003-01-01
The aim of this paper is to evaluate the use of trace diagrams for analysing collaborative problem solving. The paper describes a study where trace diagrams were used to analyse joint navigation in a virtual environment. Ten pairs of undergraduates worked together on a distributed virtual task to collect five flowers using two bees with each…
NASA Astrophysics Data System (ADS)
Langhammer, Jakub; Lendzioch, Theodora; Mirijovsky, Jakub
2016-04-01
Granulometric analysis represents a traditional, important and for the description of sedimentary material substantial method with various applications in sedimentology, hydrology and geomorphology. However, the conventional granulometric field survey methods are time consuming, laborious, costly and are invasive to the surface being sampled, which can be limiting factor for their applicability in protected areas.. The optical granulometry has recently emerged as an image analysis technique, enabling non-invasive survey, employing semi-automated identification of clasts from calibrated digital imagery, taken on site by conventional high resolution digital camera and calibrated frame. The image processing allows detection and measurement of mixed size natural grains, their sorting and quantitative analysis using standard granulometric approaches. Despite known limitations, the technique today presents reliable tool, significantly easing and speeding the field survey in fluvial geomorphology. However, the nature of such survey has still limitations in spatial coverage of the sites and applicability in research at multitemporal scale. In our study, we are presenting novel approach, based on fusion of two image analysis techniques - optical granulometry and UAV-based photogrammetry, allowing to bridge the gap between the needs of high resolution structural information for granulometric analysis and spatially accurate and data coverage. We have developed and tested a workflow that, using UAV imaging platform enabling to deliver seamless, high resolution and spatially accurate imagery of the study site from which can be derived the granulometric properties of the sedimentary material. We have set up a workflow modeling chain, providing (i) the optimum flight parameters for UAV imagery to balance the two key divergent requirements - imagery resolution and seamless spatial coverage, (ii) the workflow for the processing of UAV acquired imagery by means of the optical granulometry and (iii) the workflow for analysis of spatial distribution and temporal changes of granulometric properties across the point bar. The proposed technique was tested on a case study of an active point bar of mid-latitude mountain stream at Sumava mountains, Czech Republic, exposed to repeated flooding. The UAV photogrammetry was used to acquire very high resolution imagery to build high-precision digital terrain models and orthoimage. The orthoimage was then analyzed using the digital optical granulometric tool BaseGrain. This approach allowed us (i) to analyze the spatial distribution of the grain size in a seamless transects over an active point bar and (ii) to assess the multitemporal changes of granulometric properties of the point bar material resulting from flooding. The tested framework prove the applicability of the proposed method for granulometric analysis with accuracy comparable with field optical granulometry. The seamless nature of the data enables to study spatial distribution of granulometric properties across the study sites as well as the analysis of multitemporal changes, resulting from repeated imaging.
Liebe, J D; Hübner, U; Straede, M C; Thye, J
2015-01-01
Availability and usage of individual IT applications have been studied intensively in the past years. Recently, IT support of clinical processes is attaining increasing attention. The underlying construct that describes the IT support of clinical workflows is clinical information logistics. This construct needs to be better understood, operationalised and measured. It is therefore the aim of this study to propose and develop a workflow composite score (WCS) for measuring clinical information logistics and to examine its quality based on reliability and validity analyses. We largely followed the procedural model of MacKenzie and colleagues (2011) for defining and conceptualising the construct domain, for developing the measurement instrument, assessing the content validity, pretesting the instrument, specifying the model, capturing the data and computing the WCS and testing the reliability and validity. Clinical information logistics was decomposed into the descriptors data and information, function, integration and distribution, which embraced the framework validated by an analysis of the international literature. This framework was refined selecting representative clinical processes. We chose ward rounds, pre- and post-surgery processes and discharge as sample processes that served as concrete instances for the measurements. They are sufficiently complex, represent core clinical processes and involve different professions, departments and settings. The score was computed on the basis of data from 183 hospitals of different size, ownership, location and teaching status. Testing the reliability and validity yielded encouraging results: the reliability was high with r(split-half) = 0.89, the WCS discriminated between groups; the WCS correlated significantly and moderately with two EHR models and the WCS received good evaluation results by a sample of chief information officers (n = 67). These findings suggest the further utilisation of the WCS. As the WCS does not assume ideal workflows as a gold standard but measures IT support of clinical workflows according to validated descriptors a high portability of the WCS to other hospitals in other countries is very likely. The WCS will contribute to a better understanding of the construct clinical information logistics.
A three-level atomicity model for decentralized workflow management systems
NASA Astrophysics Data System (ADS)
Ben-Shaul, Israel Z.; Heineman, George T.
1996-12-01
A workflow management system (WFMS) employs a workflow manager (WM) to execute and automate the various activities within a workflow. To protect the consistency of data, the WM encapsulates each activity with a transaction; a transaction manager (TM) then guarantees the atomicity of activities. Since workflows often group several activities together, the TM is responsible for guaranteeing the atomicity of these units. There are scalability issues, however, with centralized WFMSs. Decentralized WFMSs provide an architecture for multiple autonomous WFMSs to interoperate, thus accommodating multiple workflows and geographically-dispersed teams. When atomic units are composed of activities spread across multiple WFMSs, however, there is a conflict between global atomicity and local autonomy of each WFMS. This paper describes a decentralized atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS. We describe an architecture that realizes our model and execution paradigm.
NASA Astrophysics Data System (ADS)
Mittempergher, Silvia; Vho, Alice; Bistacchi, Andrea
2016-04-01
A quantitative analysis of fault-rock distribution in outcrops of exhumed fault zones is of fundamental importance for studies of fault zone architecture, fault and earthquake mechanics, and fluid circulation. We present a semi-automatic workflow for fault-rock mapping on a Digital Outcrop Model (DOM), developed on the Gole Larghe Fault Zone (GLFZ), a well exposed strike-slip fault in the Adamello batholith (Italian Southern Alps). The GLFZ has been exhumed from ca. 8-10 km depth, and consists of hundreds of individual seismogenic slip surfaces lined by green cataclasites (crushed wall rocks cemented by the hydrothermal epidote and K-feldspar) and black pseudotachylytes (solidified frictional melts, considered as a marker for seismic slip). A digital model of selected outcrop exposures was reconstructed with photogrammetric techniques, using a large number of high resolution digital photographs processed with VisualSFM software. The resulting DOM has a resolution up to 0.2 mm/pixel. Most of the outcrop was imaged using images each one covering a 1 x 1 m2 area, while selected structural features, such as sidewall ripouts or stepovers, were covered with higher-resolution images covering 30 x 40 cm2 areas.Image processing algorithms were preliminarily tested using the ImageJ-Fiji package, then a workflow in Matlab was developed to process a large collection of images sequentially. Particularly in detailed 30 x 40 cm images, cataclasites and hydrothermal veins were successfully identified using spectral analysis in RGB and HSV color spaces. This allows mapping the network of cataclasites and veins which provided the pathway for hydrothermal fluid circulation, and also the volume of mineralization, since we are able to measure the thickness of cataclasites and veins on the outcrop surface. The spectral signature of pseudotachylyte veins is indistinguishable from that of biotite grains in the wall rock (tonalite), so we tested morphological analysis tools to discriminate them with respect to biotite. In higher resolution images this could be performed using circularity and size thresholds, however this could not be easily implemented in an automated procedure since the thresholds must be varied by the interpreter almost for each image. In 1 x 1 m images the resolution is generally too low to distinguish cataclasite and pseudotachylyte, so most of the time fault rocks were treated together. For this analysis we developed a fully automated workflow that, after applying noise correction, classification and skeletonization algorithms, returns labeled edge images of fault segments together with vector polylines associated to edge properties. Vector and edge properties represent a useful format to perform further quantitative analysis, for instance for classifying fault segments based on structural criteria, detect continuous fault traces, and detect the kind of termination of faults/fractures. This approach allows to collect statistically relevant datasets useful for further quantitative structural analysis.
A Tool Supporting Collaborative Data Analytics Workflow Design and Management
NASA Astrophysics Data System (ADS)
Zhang, J.; Bao, Q.; Lee, T. J.
2016-12-01
Collaborative experiment design could significantly enhance the sharing and adoption of the data analytics algorithms and models emerged in Earth science. Existing data-oriented workflow tools, however, are not suitable to support collaborative design of such a workflow, to name a few, to support real-time co-design; to track how a workflow evolves over time based on changing designs contributed by multiple Earth scientists; and to capture and retrieve collaboration knowledge on workflow design (discussions that lead to a design). To address the aforementioned challenges, we have designed and developed a technique supporting collaborative data-oriented workflow composition and management, as a key component toward supporting big data collaboration through the Internet. Reproducibility and scalability are two major targets demanding fundamental infrastructural support. One outcome of the project os a software tool, supporting an elastic number of groups of Earth scientists to collaboratively design and compose data analytics workflows through the Internet. Instead of recreating the wheel, we have extended an existing workflow tool VisTrails into an online collaborative environment as a proof of concept.
Ray tracing the Wigner distribution function for optical simulations
NASA Astrophysics Data System (ADS)
Mout, Marco; Wick, Michael; Bociort, Florian; Petschulat, Joerg; Urbach, Paul
2018-01-01
We study a simulation method that uses the Wigner distribution function to incorporate wave optical effects in an established framework based on geometrical optics, i.e., a ray tracing engine. We use the method to calculate point spread functions and show that it is accurate for paraxial systems but produces unphysical results in the presence of aberrations. The cause of these anomalies is explained using an analytical model.
QMachine: commodity supercomputing in web browsers.
Wilkinson, Sean R; Almeida, Jonas S
2014-06-09
Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
Electric Power Distribution System Model Simplification Using Segment Substitution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reiman, Andrew P.; McDermott, Thomas E.; Akcakaya, Murat
Quasi-static time-series (QSTS) simulation is used to simulate the behavior of distribution systems over long periods of time (typically hours to years). The technique involves repeatedly solving the load-flow problem for a distribution system model and is useful for distributed energy resource (DER) planning. When a QSTS simulation has a small time step and a long duration, the computational burden of the simulation can be a barrier to integration into utility workflows. One way to relieve the computational burden is to simplify the system model. The segment substitution method of simplifying distribution system models introduced in this paper offers modelmore » bus reduction of up to 98% with a simplification error as low as 0.2% (0.002 pu voltage). In contrast to existing methods of distribution system model simplification, which rely on topological inspection and linearization, the segment substitution method uses black-box segment data and an assumed simplified topology.« less
Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan
2015-01-01
The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow. Copyright © 2015 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Seamless Provenance Representation and Use in Collaborative Science Scenarios
NASA Astrophysics Data System (ADS)
Missier, P.; Ludaescher, B.; Bowers, S.; Altintas, I.; Anand, M. K.; Dey, S.; Sarkar, A.; Shrestha, B.; Goble, C.
2010-12-01
The notion of sharing scientific data has only recently begun to gain ground in science, where data is still considered a private asset. There is growing evidence, however, that the benefits of scientific collaboration through early data sharing during the course of a science project may outgrow the risk of losing exclusive ownership of the data. As exemplar success stories are making the headlines[1], principles of effective information sharing have become the subject of e-science research. In particular, any piece of published data should be self-describing, to the extent necessary for consumers to determine its suitability for reuse in their own projects. This is accomplished by associating a body of formally specified and machine-processable metadata to the data. When data is produced and reused by independent groups, however, metadata interoperability issues emerge. This is the case for provenance, a form of metadata that describes the history of a data product, Y. Provenance is typically expressed as a graph-structured set of dependencies that account for the sequence of computational or interactive steps that led to Y, often starting from some primary, observational data. Traversing dependency graphs is one of the mechanisms used to answer questions on data reliability. In the context of the NSF DataONE project[2], we have been studying issues of provenance interoperability in scientific collaboration scenarios. Consider a first scientist, Alice, who publishes a data product X along with its provenance, and a second scientist who further transforms X into a new product Y, also along with its provenance. A third scientist, who is interested in Y, expects to be able to trace Y's history up to the inputs used by Alice. This is only possible, however, if provenance accumulates into a single, uniform graph that can be seamlessly traversed. This becomes problematic when provenance is captured using different tools and computational models (i.e. workflow systems), as well as when data is published and reused using mechanisms that are not provenance-aware. In this presentation we discuss requirements for ensuring provenance-aware data publishing and reuse, and describe the design and implementation of a prototype toolkit that involves two specific, and broadly used, workflow models, Kepler [3] and Taverna [4]. The implementation is expected to be adopted as part of DataONE's investigators' toolkit, in support of its mission of large-scale data preservation. Refs. [1]Sharing of Data Leads to Progress on Alzheimer’s, G. Kolata, NYT, 8/12/2010 [2]http://www.dataone.org [3]Ludaescher B., Altintas I. et al. Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065, 2006 [4]D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, T. Oinn. Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 34: W729-W732, 2006
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-21
... Defense Federal Acquisition Regulation Supplement; Updates to Wide Area WorkFlow (DFARS Case 2011-D027... Wide Area WorkFlow (WAWF) and TRICARE Encounter Data System (TEDS). WAWF, which electronically... civil emergencies, when access to Wide Area WorkFlow by those contractors is not feasible; (4) Purchases...
An Auto-management Thesis Program WebMIS Based on Workflow
NASA Astrophysics Data System (ADS)
Chang, Li; Jie, Shi; Weibo, Zhong
An auto-management WebMIS based on workflow for bachelor thesis program is given in this paper. A module used for workflow dispatching is designed and realized using MySQL and J2EE according to the work principle of workflow engine. The module can automatively dispatch the workflow according to the date of system, login information and the work status of the user. The WebMIS changes the management from handwork to computer-work which not only standardizes the thesis program but also keeps the data and documents clean and consistent.
Piper, David Z.; Skorupa, J.P.; Presser, T.S.; Hardy, M.A.; Hamilton, S.J.; Huebner, M.; Gulbrandsen, R.A.
2000-01-01
Major-element oxides and trace elements in the Phosphoria Formation at the Hot Springs Mine, Idaho were determined by a series of techniques. In this report, we examine the distribution of trace elements between the different solid components aluminosilicates, apatite, organic matter, opal, calcite, and dolomite that largely make up the rocks. High concentrations of several trace elements throughout the deposit, for example, As, Cd, Se, Tl, and U, at this and previously examined sites have raised concern about their introduction into the environment via weathering and the degree to which mining and the disposal of mined waste rock from this deposit might be accelerating that process. The question addressed here is how might the partitioning of trace elements between these solid host components influence the introduction of trace elements into ground water, surface water, and eventually biota, via weathering? In the case of Se, it is partitioned into components that are quite labile under the oxidizing conditions of subaerial weathering. As a result, it is widely distributed throughout the environment. Its concentration exceeds the level of concern for protection of wildlife at virtually every trophic level.
Bravo, Sandra; García-Ordiales, Efrén; García-Navarro, Francisco Jesús; Amorós, José Ángel; Pérez-de-Los-Reyes, Caridad; Jiménez-Ballesta, Raimundo; Esbrí, José María; García-Noguero, Eva María; Higueras, Pablo
2017-09-07
Castilla-La Mancha (central Spain) is a region characterized by significant agricultural production aimed at high-quality food products such as wine and olive oil. The quality of agricultural products depends directly on the soil quality. Soil geochemistry, including dispersion maps and the recognition of baselines and anomalies of various origins, is the most important tool to assess soil quality. With this objective, 200 soil samples were taken from agricultural areas distributed among the different geological domains present in the region. Analysis of these samples included evaluation of edaphological parameters (reactivity, electrical conductivity, organic matter content) and the geochemistry of major and trace elements by X-ray fluorescence. The dataset obtained was statistically analyzed for major elements and, in the case of trace elements, was normalized with respect to Al and analyzed using the relative cumulative frequency (RCF) distribution method. Furthermore, the geographic distribution of analytical data was characterized and analyzed using the kriging technique, with a correspondence found between major and trace elements in the different geologic domains of the region as well as with the most important mining areas. The results show an influence of the clay fraction present in the soil, which acts as a repository for trace elements. On the basis of the results, of the possible elements related with clay that could be used for normalization, Al was selected as the most suitable, followed by Fe, Mn, and Ti. Reference values estimated using this methodology were lower than those estimated in previous studies.
CosApps: Simulate gravitational lensing through ray tracing and shear calculation
NASA Astrophysics Data System (ADS)
Coss, David
2017-12-01
Cosmology Applications (CosApps) provides tools to simulate gravitational lensing using two different techniques, ray tracing and shear calculation. The tool ray_trace_ellipse calculates deflection angles on a grid for light passing a deflecting mass distribution. Using MPI, ray_trace_ellipse may calculate deflection in parallel across network connected computers, such as cluster. The program physcalc calculates the gravitational lensing shear using the relationship of convergence and shear, described by a set of coupled partial differential equations.
Chen, Ting; Jin, Yiying; Qiu, Xiaopeng; Chen, Xin
2015-03-01
Using laboratory experiments, the authors investigated the impact of dry-heat and moist-heat treatment processes on hazardous trace elements (As, Hg, Cd, Cr, and Pb) in food waste and explored their distribution patterns for three waste components: oil, aqueous, and solid components. The results indicated that an insignificant reduction of hazardous trace elements in heat-treated waste-0.61-14.29% after moist-heat treatment and 4.53-12.25% after dry-heat treatment-and a significant reduction in hazardous trace elements (except for Hg without external addition) after centrifugal dehydration (P < 0.5). Moreover, after heat treatment, over 90% of the hazardous trace elements in the waste were detected in the aqueous and solid components, whereas only a trace amount of hazardous trace elements was detected in the oil component (<0.01%). In addition, results indicated that heat treatment process did not significantly reduce the concentration of hazardous trace elements in food waste, but the separation process for solid and aqueous components, such as centrifugal dehydration, could reduce the risk considerably. Finally, combined with the separation technology for solid and liquid components, dry-heat treatment is superior to moist-heat treatment on the removal of external water-soluble ionic hazardous trace elements. An insignificant reduction of hazardous trace elements in heat-treated waste showed that heat treatment does not reduce trace elements contamination in food waste considerably, whereas the separation process for solid and aqueous components, such as centrifugal dehydration, could reduce the risk significantly. Moreover, combined with the separation technology for solid and liquid components, dry-heat treatment is superior to moist-heat treatment for the removal of external water-soluble ionic hazardous trace elements, by exploring distribution patterns of trace elements in three waste components: oil, aqueous, and solid components.
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users.
Watts, Nicholas A; Feltus, Frank A
2017-02-15
The ability to centralize and store data for long periods on an end user's computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS . ffeltus@clemson.edu. © The Author 2016. Published by Oxford University Press.
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
Watts, Nicholas A.
2017-01-01
Motivation: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. Results: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. Availability and Implementation: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS. Contact: ffeltus@clemson.edu PMID:27797780
Niedzwiecki, Megan M; Austin, Christine; Remark, Romain; Merad, Miriam; Gnjatic, Sacha; Estrada-Gutierrez, Guadalupe; Espejel-Nuñez, Aurora; Borboa-Olivares, Hector; Guzman-Huerta, Mario; Wright, Rosalind J; Wright, Robert O; Arora, Manish
2016-04-01
Fetal exposure to essential and toxic metals can influence life-long health trajectories. The placenta regulates chemical transmission from maternal circulation to the fetus and itself exhibits a complex response to environmental stressors. The placenta can thus be a useful matrix to monitor metal exposures and stress responses in utero, but strategies to explore the biologic effects of metal mixtures in this organ are not well-developed. In this proof-of-concept study, we used laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) to measure the distributions of multiple metals in placental tissue from a low-birth-weight pregnancy, and we developed an approach to identify the components of metal mixtures that colocalized with biological response markers. Our novel workflow, which includes custom-developed software tools and algorithms for spatial outlier identification and background subtraction in multidimensional elemental image stacks, enables rapid image processing and seamless integration of data from elemental imaging and immunohistochemistry. Using quantitative spatial statistics, we identified distinct patterns of metal accumulation at sites of inflammation. Broadly, our multiplexed approach can be used to explore the mechanisms mediating complex metal exposures and biologic responses within placentae and other tissue types. Our LA-ICP-MS image processing workflow can be accessed through our interactive R Shiny application 'shinyImaging', which is available at or through our laboratory's website, .
Neubert, Sebastian; Göde, Bernd; Gu, Xiangyu; Stoll, Norbert; Thurow, Kerstin
2017-04-01
Modern business process management (BPM) is increasingly interesting for laboratory automation. End-to-end workflow automation and improved top-level systems integration for information technology (IT) and automation systems are especially prominent objectives. With the ISO Standard Business Process Model and Notation (BPMN) 2.X, a system-independent and interdisciplinary accepted graphical process control notation is provided, allowing process analysis, while also being executable. The transfer of BPM solutions to structured laboratory automation places novel demands, for example, concerning the real-time-critical process and systems integration. The article discusses the potential of laboratory execution systems (LESs) for an easier implementation of the business process management system (BPMS) in hierarchical laboratory automation. In particular, complex application scenarios, including long process chains based on, for example, several distributed automation islands and mobile laboratory robots for a material transport, are difficult to handle in BPMSs. The presented approach deals with the displacement of workflow control tasks into life science specialized LESs, the reduction of numerous different interfaces between BPMSs and subsystems, and the simplification of complex process modelings. Thus, the integration effort for complex laboratory workflows can be significantly reduced for strictly structured automation solutions. An example application, consisting of a mixture of manual and automated subprocesses, is demonstrated by the presented BPMS-LES approach.
Le-Thi-Thu, Huong; Casanola-Martín, Gerardo M; Marrero-Ponce, Yovani; Rescigno, Antonio; Abad, Concepcion; Khan, Mahmud Tareq Hassan
2014-01-01
The tyrosinase is a bifunctional, copper-containing enzyme widely distributed in the phylogenetic tree. This enzyme is involved in the production of melanin and some other pigments in humans, animals and plants, including skin pigmentations in mammals, and browning process in plants and vegetables. Therefore, enzyme inhibitors has been under the attention of the scientist community, due to its broad applications in food, cosmetic, agricultural and medicinal fields, to avoid the undesirable effects of abnormal melanin overproduction. However, the research of novel chemical with antityrosinase activity demands the use of more efficient tools to speed up the tyrosinase inhibitors discovery process. This chapter is focused in the different components of a predictive modeling workflow for the identification and prioritization of potential new compounds with activity against the tyrosinase enzyme. In this case, two structure chemical libraries Spectrum Collection and Drugbank are used in this attempt to combine different virtual screening data mining techniques, in a sequential manner helping to avoid the usually expensive and time consuming traditional methods. Some of the sequential steps summarize here comprise the use of drug-likeness filters, similarity searching, classification and potency QSAR multiclassifier systems, modeling molecular interactions systems, and similarity/diversity analysis. Finally, the methodologies showed here provide a rational workflow for virtual screening hit analysis and selection as a promissory drug discovery strategy for use in target identification phase.
Structuring clinical workflows for diabetes care: an overview of the OntoHealth approach.
Schweitzer, M; Lasierra, N; Oberbichler, S; Toma, I; Fensel, A; Hoerbst, A
2014-01-01
Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view.
Structuring Clinical Workflows for Diabetes Care
Lasierra, N.; Oberbichler, S.; Toma, I.; Fensel, A.; Hoerbst, A.
2014-01-01
Summary Background Electronic health records (EHRs) play an important role in the treatment of chronic diseases such as diabetes mellitus. Although the interoperability and selected functionality of EHRs are already addressed by a number of standards and best practices, such as IHE or HL7, the majority of these systems are still monolithic from a user-functionality perspective. The purpose of the OntoHealth project is to foster a functionally flexible, standards-based use of EHRs to support clinical routine task execution by means of workflow patterns and to shift the present EHR usage to a more comprehensive integration concerning complete clinical workflows. Objectives The goal of this paper is, first, to introduce the basic architecture of the proposed OntoHealth project and, second, to present selected functional needs and a functional categorization regarding workflow-based interactions with EHRs in the domain of diabetes. Methods A systematic literature review regarding attributes of workflows in the domain of diabetes was conducted. Eligible references were gathered and analyzed using a qualitative content analysis. Subsequently, a functional workflow categorization was derived from diabetes-specific raw data together with existing general workflow patterns. Results This paper presents the design of the architecture as well as a categorization model which makes it possible to describe the components or building blocks within clinical workflows. The results of our study lead us to identify basic building blocks, named as actions, decisions, and data elements, which allow the composition of clinical workflows within five identified contexts. Conclusions The categorization model allows for a description of the components or building blocks of clinical workflows from a functional view. PMID:25024765
RAY-UI: A powerful and extensible user interface for RAY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baumgärtel, P., E-mail: peter.baumgaertel@helmholtz-berlin.de; Erko, A.; Schäfers, F.
2016-07-27
The RAY-UI project started as a proof-of-concept for an interactive and graphical user interface (UI) for the well-known ray tracing software RAY [1]. In the meantime, it has evolved into a powerful enhanced version of RAY that will serve as the platform for future development and improvement of associated tools. The software as of today supports nearly all sophisticated simulation features of RAY. Furthermore, it delivers very significant usability and work efficiency improvements. Beamline elements can be quickly added or removed in the interactive sequence view. Parameters of any selected element can be accessed directly and in arbitrary order. Withmore » a single click, parameter changes can be tested and new simulation results can be obtained. All analysis results can be explored interactively right after ray tracing by means of powerful integrated image viewing and graphing tools. Unlimited image planes can be positioned anywhere in the beamline, and bundles of image planes can be created for moving the plane along the beam to identify the focus position with live updates of the simulated results. In addition to showing the features and workflow of RAY-UI, we will give an overview of the underlying software architecture as well as examples for use and an outlook for future developments.« less
NASA Astrophysics Data System (ADS)
de Macedo, Isadora A. S.; da Silva, Carolina B.; de Figueiredo, J. J. S.; Omoboya, Bode
2017-01-01
Wavelet estimation as well as seismic-to-well tie procedures are at the core of every seismic interpretation workflow. In this paper we perform a comparative study of wavelet estimation methods for seismic-to-well tie. Two approaches to wavelet estimation are discussed: a deterministic estimation, based on both seismic and well log data, and a statistical estimation, based on predictive deconvolution and the classical assumptions of the convolutional model, which provides a minimum-phase wavelet. Our algorithms, for both wavelet estimation methods introduce a semi-automatic approach to determine the optimum parameters of deterministic wavelet estimation and statistical wavelet estimation and, further, to estimate the optimum seismic wavelets by searching for the highest correlation coefficient between the recorded trace and the synthetic trace, when the time-depth relationship is accurate. Tests with numerical data show some qualitative conclusions, which are probably useful for seismic inversion and interpretation of field data, by comparing deterministic wavelet estimation and statistical wavelet estimation in detail, especially for field data example. The feasibility of this approach is verified on real seismic and well data from Viking Graben field, North Sea, Norway. Our results also show the influence of the washout zones on well log data on the quality of the well to seismic tie.
Modelling and analysis of workflow for lean supply chains
NASA Astrophysics Data System (ADS)
Ma, Jinping; Wang, Kanliang; Xu, Lida
2011-11-01
Cross-organisational workflow systems are a component of enterprise information systems which support collaborative business process among organisations in supply chain. Currently, the majority of workflow systems is developed in perspectives of information modelling without considering actual requirements of supply chain management. In this article, we focus on the modelling and analysis of the cross-organisational workflow systems in the context of lean supply chain (LSC) using Petri nets. First, the article describes the assumed conditions of cross-organisation workflow net according to the idea of LSC and then discusses the standardisation of collaborating business process between organisations in the context of LSC. Second, the concept of labelled time Petri nets (LTPNs) is defined through combining labelled Petri nets with time Petri nets, and the concept of labelled time workflow nets (LTWNs) is also defined based on LTPNs. Cross-organisational labelled time workflow nets (CLTWNs) is then defined based on LTWNs. Third, the article proposes the notion of OR-silent CLTWNS and a verifying approach to the soundness of LTWNs and CLTWNs. Finally, this article illustrates how to use the proposed method by a simple example. The purpose of this research is to establish a formal method of modelling and analysis of workflow systems for LSC. This study initiates a new perspective of research on cross-organisational workflow management and promotes operation management of LSC in real world settings.
UBioLab: a web-laboratory for ubiquitous in-silico experiments.
Bartocci, Ezio; Cacciagrano, Diletta; Di Berardini, Maria Rita; Merelli, Emanuela; Vito, Leonardo
2012-07-09
The huge and dynamic amount of bioinformatic resources (e.g., data and tools) available nowadays in Internet represents a big challenge for biologists –for what concerns their management and visualization– and for bioinformaticians –for what concerns the possibility of rapidly creating and executing in-silico experiments involving resources and activities spread over the WWW hyperspace. Any framework aiming at integrating such resources as in a physical laboratory has imperatively to tackle –and possibly to handle in a transparent and uniform way– aspects concerning physical distribution, semantic heterogeneity, co-existence of different computational paradigms and, as a consequence, of different invocation interfaces (i.e., OGSA for Grid nodes, SOAP for Web Services, Java RMI for Java objects, etc.). The framework UBioLab has been just designed and developed as a prototype following the above objective. Several architectural features –as those ones of being fully Web-based and of combining domain ontologies, Semantic Web and workflow techniques– give evidence of an effort in such a direction. The integration of a semantic knowledge management system for distributed (bioinformatic) resources, a semantic-driven graphic environment for defining and monitoring ubiquitous workflows and an intelligent agent-based technology for their distributed execution allows UBioLab to be a semantic guide for bioinformaticians and biologists providing (i) a flexible environment for visualizing, organizing and inferring any (semantics and computational) "type" of domain knowledge (e.g., resources and activities, expressed in a declarative form), (ii) a powerful engine for defining and storing semantic-driven ubiquitous in-silico experiments on the domain hyperspace, as well as (iii) a transparent, automatic and distributed environment for correct experiment executions.
myExperiment: a repository and social network for the sharing of bioinformatics workflows
Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David
2010-01-01
myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org. PMID:20501605
Novak, Laurie L; Johnson, Kevin B; Lorenzi, Nancy M
2010-01-01
The objective of this review was to describe methods used to study and model workflow. The authors included studies set in a variety of industries using qualitative, quantitative and mixed methods. Of the 6221 matching abstracts, 127 articles were included in the final corpus. The authors collected data from each article on researcher perspective, study type, methods type, specific methods, approaches to evaluating quality of results, definition of workflow and dependent variables. Ethnographic observation and interviews were the most frequently used methods. Long study durations revealed the large time commitment required for descriptive workflow research. The most frequently discussed technique for evaluating quality of study results was triangulation. The definition of the term “workflow” and choice of methods for studying workflow varied widely across research areas and researcher perspectives. The authors developed a conceptual framework of workflow-related terminology for use in future research and present this model for use by other researchers. PMID:20442143
Digitization workflows for flat sheets and packets of plants, algae, and fungi1
Nelson, Gil; Sweeney, Patrick; Wallace, Lisa E.; Rabeler, Richard K.; Allard, Dorothy; Brown, Herrick; Carter, J. Richard; Denslow, Michael W.; Ellwood, Elizabeth R.; Germain-Aubrey, Charlotte C.; Gilbert, Ed; Gillespie, Emily; Goertzen, Leslie R.; Legler, Ben; Marchant, D. Blaine; Marsico, Travis D.; Morris, Ashley B.; Murrell, Zack; Nazaire, Mare; Neefus, Chris; Oberreiter, Shanna; Paul, Deborah; Ruhfel, Brad R.; Sasek, Thomas; Shaw, Joey; Soltis, Pamela S.; Watson, Kimberly; Weeks, Andrea; Mast, Austin R.
2015-01-01
Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged. PMID:26421256
gProcess and ESIP Platforms for Satellite Imagery Processing over the Grid
NASA Astrophysics Data System (ADS)
Bacu, Victor; Gorgan, Dorian; Rodila, Denisa; Pop, Florin; Neagu, Gabriel; Petcu, Dana
2010-05-01
The Environment oriented Satellite Data Processing Platform (ESIP) is developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) co-funded by the European Commission through FP7 [1]. The gProcess Platform [2] is a set of tools and services supporting the development and the execution over the Grid of the workflow based processing, and particularly the satelite imagery processing. The ESIP [3], [4] is build on top of the gProcess platform by adding a set of satellite image processing software modules and meteorological algorithms. The satellite images can reveal and supply important information on earth surface parameters, climate data, pollution level, weather conditions that can be used in different research areas. Generally, the processing algorithms of the satellite images can be decomposed in a set of modules that forms a graph representation of the processing workflow. Two types of workflows can be defined in the gProcess platform: abstract workflow (PDG - Process Description Graph), in which the user defines conceptually the algorithm, and instantiated workflow (iPDG - instantiated PDG), which is the mapping of the PDG pattern on particular satellite image and meteorological data [5]. The gProcess platform allows the definition of complex workflows by combining data resources, operators, services and sub-graphs. The gProcess platform is developed for the gLite middleware that is available in EGEE and SEE-GRID infrastructures [6]. gProcess exposes the specific functionality through web services [7]. The Editor Web Service retrieves information on available resources that are used to develop complex workflows (available operators, sub-graphs, services, supported resources, etc.). The Manager Web Service deals with resources management (uploading new resources such as workflows, operators, services, data, etc.) and in addition retrieves information on workflows. The Executor Web Service manages the execution of the instantiated workflows on the Grid infrastructure. In addition, this web service monitors the execution and generates statistical data that are important to evaluate performances and to optimize execution. The Viewer Web Service allows access to input and output data. To prove and to validate the utility of the gProcess and ESIP platforms there were developed the GreenView and GreenLand applications. The GreenView related functionality includes the refinement of some meteorological data such as temperature, and the calibration of the satellite images based on field measurements. The GreenLand application performs the classification of the satellite images by using a set of vegetation indices. The gProcess and ESIP platforms are used as well in GiSHEO project [8] to support the processing of Earth Observation data over the Grid in eGLE (GiSHEO eLearning Environment). Experiments of performance assessment were conducted and they have revealed that the workflow-based execution could improve the execution time of a satellite image processing algorithm [9]. It is not a reliable solution to execute all the workflow nodes on different machines. The execution of some nodes can be more time consuming and they will be performed in a longer time than other nodes. The total execution time will be affected because some nodes will slow down the execution. It is important to correctly balance the workflow nodes. Based on some optimization strategy the workflow nodes can be grouped horizontally, vertically or in a hybrid approach. In this way, those operators will be executed on one machine and also the data transfer between workflow nodes will be lower. The dynamic nature of the Grid infrastructure makes it more exposed to the occurrence of failures. These failures can occur at worker node, services availability, storage element, etc. Currently gProcess has support for some basic error prevention and error management solutions. In future, some more advanced error prevention and management solutions will be integrated in the gProcess platform. References [1] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [2] Bacu V., Stefanut T., Rodila D., Gorgan D., Process Description Graph Composition by gProcess Platform. HiPerGRID - 3rd International Workshop on High Performance Grid Middleware, 28 May, Bucharest. Proceedings of CSCS-17 Conference, Vol.2., ISSN 2066-4451, pp. 423-430, (2009). [3] ESIP Platform, http://wiki.egee-see.org/index.php/JRA1_Commonalities [4] Gorgan D., Bacu V., Rodila D., Pop Fl., Petcu D., Experiments on ESIP - Environment oriented Satellite Data Processing Platform. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 157-166 (2009). [5] Radu, A., Bacu, V., Gorgan, D., Diagrammatic Description of Satellite Image Processing Workflow. Workshop on Grid Computing Applications Development (GridCAD) at the SYNASC Symposium, 28 September 2007, Timisoara, IEEE Computer Press, ISBN 0-7695-3078-8, 2007, pp. 341-348 (2007). [6] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [7] Rodila D., Bacu V., Gorgan D., Integration of Satellite Image Operators as Workflows in the gProcess Application. Proceedings of ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27-29 Aug, 2009 Cluj-Napoca. ISBN: 978-1-4244-5007-7, pp. 355-358 (2009). [8] GiSHEO consortium, Project site, http://gisheo.info.uvt.ro [9] Bacu V., Gorgan D., Graph Based Evaluation of Satellite Imagery Processing over Grid. ISPDC 2008 - 7th International Symposium on Parallel and Distributed Computing, July 1-5, 2008, Krakow, Poland. IEEE Computer Society 2008, ISBN: 978-0-7695-3472-5, pp. 147-154.
Disruption of Radiologist Workflow.
Kansagra, Akash P; Liu, Kevin; Yu, John-Paul J
2016-01-01
The effect of disruptions has been studied extensively in surgery and emergency medicine, and a number of solutions-such as preoperative checklists-have been implemented to enforce the integrity of critical safety-related workflows. Disruptions of the highly complex and cognitively demanding workflow of modern clinical radiology have only recently attracted attention as a potential safety hazard. In this article, we describe the variety of disruptions that arise in the reading room environment, review approaches that other specialties have taken to mitigate workflow disruption, and suggest possible solutions for workflow improvement in radiology. Copyright © 2015 Mosby, Inc. All rights reserved.
Barista: A Framework for Concurrent Speech Processing by USC-SAIL
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.
2016-01-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S
2014-05-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Distribution of trace metals in anchialine caves of Adriatic Sea, Croatia
NASA Astrophysics Data System (ADS)
Cuculić, Vlado; Cukrov, Neven; Kwokal, Željko; Mlakar, Marina
2011-11-01
This study presents results of the first comprehensive research on ecotoxic trace metals (Cd, Pb, Cu and Zn) in aquatic anchialine ecosystems. Data show the influence of hydrological and geological characteristics on trace metals in highly stratified anchialine water columns. Distribution of Cd, Pb, Cu and Zn in two anchialine water bodies, Bjejajka Cave and Lenga Pit in the Mljet National park, Croatia were investigated seasonally from 2006 to 2010. Behaviour and concentrations of dissolved and total trace metals in stratified water columns and metal contents in sediment, carbonate rocks and soil of the anchialine environment were evaluated. Trace metals and dissolved organic carbon (DOC) concentrations in both anchialine water columns were significantly elevated compared to adjacent seawater. Zn and Cu concentrations were the highest in the Lenga Pit water column and sediment. Elevated concentrations of Zn, Pb and Cu in Bjejajka Cave were mainly terrigenous. Significantly elevated concentrations of cadmium (up to 0.3 μg L -1) were found in the water column of Bjejajka cave, almost two orders of magnitude higher compared to nearby surface seawater. Laboratory analysis revealed that bat guano was the major source of cadmium in Bjejajka Cave. Cadmium levels in Lenga Pit, which lacks accumulations of bat guano, were 20-fold lower. Moreover, low metal amounts in carbonate rocks in both caves, combined with mineral leaching experiments, revealed that carbonates play a minor role as a source of metals in both water columns. We observed two types of vertical distribution pattern of cadmium in the stratified anchialine Bjejajka Cave water column. At lower salinities, non-conservative behaviour was characterized by strong desorption and enrichment of dissolved phase while, at salinities above 20, Cd behaved conservatively and its dissolved concentration decreased. Conservative behaviour of Cu, Pb, Zn and DOC was observed throughout the water column. After heavy rains, Cd showed reduced concentration and uniform vertical distribution, suggesting a non-terrestrial origin. Under the same conditions, concentrations of total and dissolved Pb, Cu, Zn and DOC were significantly elevated. Variations of trace metal vertical distributions in anchialine water columns were caused by large inputs of fresh water (extraordinary rainy events), and were not influenced by seasonal changes.
DeLong, Stephen B.; Donnellan, Andrea; Ponti, Daniel J.; Rubin, Ron S.; Lienkaemper, James J.; Prentice, Carol S.; Dawson, Timothy E.; Seitz, Gordon G.; Schwartz, David P.; Hudnut, Kenneth W.; Rosa, Carla M.; Pickering, Alexandra J; Parker, Jay W.
2016-01-01
The Mw 6.0 South Napa earthquake of 24 August 2014 caused slip on several active fault strands within the West Napa Fault Zone (WNFZ). Field mapping identified 12.5 km of surface rupture. These field observations, near-field geodesy and space geodesy, together provide evidence for more than ~30 km of surface deformation with a relatively complex distribution across a number of subparallel lineaments. Along a ~7 km section north of the epicenter, the surface rupture is confined to a single trace that cuts alluvial deposits, reoccupying a low-slope scarp. The rupture continued northward onto at least four other traces through subparallel ridges and valleys. Postseismic slip exceeded coseismic slip along much of the southern part of the main rupture trace with total slip 1 year postevent approaching 0.5 m at locations where only a few centimeters were measured the day of the earthquake. Analysis of airborne interferometric synthetic aperture radar data provides slip distributions along fault traces, indicates connectivity and extent of secondary traces, and confirms that postseismic slip only occurred on the main trace of the fault, perhaps indicating secondary structures ruptured as coseismic triggered slip. Previous mapping identified the WNFZ as a zone of distributed faulting, and this was generally borne out by the complex 2014 rupture pattern. Implications for hazard analysis in similar settings include the need to consider the possibility of complex surface rupture in areas of complex topography, especially where multiple potentially Quaternary-active fault strands can be mapped.
Workflow-Oriented Cyberinfrastructure for Sensor Data Analytics
NASA Astrophysics Data System (ADS)
Orcutt, J. A.; Rajasekar, A.; Moore, R. W.; Vernon, F.
2015-12-01
Sensor streams comprise an increasingly large part of Earth Science data. Analytics based on sensor data require an easy way to perform operations such as acquisition, conversion to physical units, metadata linking, sensor fusion, analysis and visualization on distributed sensor streams. Furthermore, embedding real-time sensor data into scientific workflows is of growing interest. We have implemented a scalable networked architecture that can be used to dynamically access packets of data in a stream from multiple sensors, and perform synthesis and analysis across a distributed network. Our system is based on the integrated Rule Oriented Data System (irods.org), which accesses sensor data from the Antelope Real Time Data System (brtt.com), and provides virtualized access to collections of data streams. We integrate real-time data streaming from different sources, collected for different purposes, on different time and spatial scales, and sensed by different methods. iRODS, noted for its policy-oriented data management, brings to sensor processing features and facilities such as single sign-on, third party access control lists ( ACLs), location transparency, logical resource naming, and server-side modeling capabilities while reducing the burden on sensor network operators. Rich integrated metadata support also makes it straightforward to discover data streams of interest and maintain data provenance. The workflow support in iRODS readily integrates sensor processing into any analytical pipeline. The system is developed as part of the NSF-funded Datanet Federation Consortium (datafed.org). APIs for selecting, opening, reaping and closing sensor streams are provided, along with other helper functions to associate metadata and convert sensor packets into NetCDF and JSON formats. Near real-time sensor data including seismic sensors, environmental sensors, LIDAR and video streams are available through this interface. A system for archiving sensor data and metadata in NetCDF format has been implemented and will be demonstrated at AGU.
SU-C-213-03: Custom 3D Printed Boluses for Radiation Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, B; Yang, M; Yan, Y
2015-06-15
Purpose: To develop a clinical workflow and to commission the process of creating custom 3d printed boluses for radiation therapy. Methods: We designed a workflow to create custom boluses using a commercial 3D printer. Contours of several patients were deformably mapped to phantoms where the test bolus contours were designed. Treatment plans were created on the phantoms following our institutional planning guideline. The DICOM file of the bolus contours were then converted to stereoLithography (stl) file for the 3d printer. The boluses were printed on a commercial 3D printer using polylactic acid (PLA) material. Custom printing parameters were optimized inmore » order to meet the requirement of bolus composition. The workflow was tested on multiple anatomical sites such as skull, nose and chest wall. The size of boluses varies from 6×9cm2 to 12×25cm2. To commission the process, basic CT and dose properties of the printing materials were measured in photon and electron beams and compared against water and soft superflab bolus. Phantoms were then scanned to confirm the placement of custom boluses. Finally dose distributions with rescanned CTs were compared with those computer-generated boluses. Results: The relative electron density(1.08±0.006) of the printed boluses resemble those of liquid tap water(1.04±0.004). The dosimetric properties resemble those of liquid tap water(1.04±0.004). The dosimetric properties were measured at dmax with an ion chamber in electron and photon open beams. Compared with solid water and soft bolus, the output difference was within 1% for the 3D printer material. The printed boluses fit well to the phantom surfaces on CT scans. The dose distribution and DVH based on the printed boluses match well with those based on TPS generated boluses. Conclusion: 3d printing provides a cost effective and convenient solution for patient-specific boluses in radiation therapy.« less
Workflows for microarray data processing in the Kepler environment.
Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark
2012-05-17
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
The geographic distribution of trace elements in the environment: the REGARDS study.
Rembert, Nicole; He, Ka; Judd, Suzanne E; McClure, Leslie A
2017-02-01
Research on trace elements and the effects of their ingestion on human health is often seen in scientific literature. However, little research has been done on the distribution of trace elements in the environment and their impact on health. This paper examines what characteristics among participants in the Reasons for Geographic and Racial Differences in Stroke (REGARDS) study are associated with levels of environmental exposure to arsenic, magnesium, mercury, and selenium. Demographic information from REGARDS participants was combined with trace element concentration data from the US Geochemical Survey (USGS). Each trace element was characterized as either low (magnesium and selenium) or high (arsenic and mercury) exposure. Associations between demographic characteristics and trace element concentrations were analyzed with unadjusted and adjusted logistic regression models. Individuals who reside in the Stroke Belt have lower odds of high exposure (4th quartile) to arsenic (OR 0.33, CI 0.31, 0.35) and increased exposure to mercury (OR 0.65, CI 0.62, 0.70) than those living outside of these areas, while the odds of low exposure to trace element concentrations were increased for magnesium (OR 5.48, CI 5.05, 5.95) and selenium (OR 2.37, CI 2.22, 2.54). We found an association between levels of trace elements in the environment and geographic region of residence, among other factors. Future studies are needed to further examine this association and determine whether or not these differences may be related to geographic variation in disease.
High Resolution Numerical Simulations and Diagnostic Studies Atmospheric Transport During TRACE-A
NASA Technical Reports Server (NTRS)
Krishnamurti, T. N.; Fuelberg, Henry
1995-01-01
The enclosed publications constitute our final report. These are publications completed in referred journals. The completed work includes: Meteorological conditions associated with vertical distributions of aerosols off the west coast of Africa. TRACE-A trajectory intercomparison 2. Isentropic and kinematic methods. Chemical characteristics of tropospheric air over the tropical South Atlantic Ocean: Relationship to trajectory history. Passive Tracer Transport Relevant to the TRACE-A Experiment. The meteorological environment of the tropospheric ozone maximum over the tropical south Atlantic Ocean. : Influence of a middle-latitude cyclone on tropospheric ozone distributions during a period of TRACE-A. All of this work provided the meteorological background for the Transport of Atmospheric Chemistry near the Equator (TRACE-A) project sponsored by Dr,, Joe McNeal of NASA. Our principal findings are that bio-mass burning sources from Africa and South America did contribute to the accumulation of tropospheric ozone over the tropical Atlantic Ocean. Other findings relate to circulations, advections of Ozone and other plausible sources for the TOMS based ozone maximum over the Atlantic Ocean.
Aircraft measurements of trace gases between Japan and Singapore in October of 1993, 1996, and 1997
NASA Astrophysics Data System (ADS)
Matsueda, Hidekazu; Inoue, Hisayuki Y.
Carbon dioxide (CO2), methane (CH4), and carbon monoxide (CO) mixing ratios were measured in discrete air samples from aircraft between Japan and Singapore in October. The mixing ratios of all trace gases at 9-12 km were enhanced over the South China Sea in 1997 compared with those in 1993 and 1996. Vertical distributions of all trace gases over Singapore in 1997 also showed largely elevated mixing ratios at all altitudes. These distributions indicate a wide outflow of trace gases from intense biomass burning in the southeast Asia regions in the very strong El Niño year. The enhanced trace gases showed a strong linear correlation between CH4 and CO, and between CO and CO2, with the regression slopes of 0.051 (ΔCH4 ppb/ΔCOppb) and 0.089 (ΔCOppb/ΔCO2ppb). The emission ratios are characteristic of fires with relatively lower combustion efficiency from the tropical rain forest and peat lands in Kalimantan and Sumatra of Indonesia.
Deploying and sharing U-Compare workflows as web services.
Kontonatsios, Georgios; Korkontzelos, Ioannis; Kolluru, Balakrishna; Thompson, Paul; Ananiadou, Sophia
2013-02-18
U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare's components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform.
Deploying and sharing U-Compare workflows as web services
2013-01-01
Background U-Compare is a text mining platform that allows the construction, evaluation and comparison of text mining workflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Users can rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components. Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import and re-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and are accessible only via a local machine, and that can only be run with the U-Compare platform. Results We address the above issues by extending U-Compare to convert standalone workflows into web services automatically, via a two-click process. The resulting web services can be registered on a central server and made publicly available. Alternatively, users can make web services available on their own servers, after installing the web application framework, which is part of the extension to U-Compare. We have performed a user-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionality of U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency and maintainability. The results obtained reveal that the new functionality is well received by users. Conclusions The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols, and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated with any application that supports these open standards. We demonstrate how the newly extended U-Compare enhances the cross-platform interoperability of workflows, by seamlessly importing a number of text mining workflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow construction platform. PMID:23419017
Johnson, Kevin B; Lorenzi, Nancy M
2011-01-01
Objective The goal of this study was to develop an in-depth understanding of how a health information exchange (HIE) fits into clinical workflow at multiple clinical sites. Materials and Methods The ethnographic qualitative study was conducted over a 9-month period in six emergency departments (ED) and eight ambulatory clinics in Memphis, Tennessee, USA. Data were collected using direct observation, informal interviews during observation, and formal semi-structured interviews. The authors observed for over 180 h, during which providers used the exchange 130 times. Results HIE-related workflow was modeled for each ED site and ambulatory clinic group and substantial site-to-site workflow differences were identified. Common patterns in HIE-related workflow were also identified across all sites, leading to the development of two role-based workflow models: nurse based and physician based. The workflow elements framework was applied to the two role-based patterns. An in-depth description was developed of how providers integrated HIE into existing clinical workflow, including prompts for HIE use. Discussion Workflow differed substantially among sites, but two general role-based HIE usage models were identified. Although providers used HIE to improve continuity of patient care, patient–provider trust played a significant role. Types of information retrieved related to roles, with nurses seeking to retrieve recent hospitalization data and more open-ended usage by nurse practitioners and physicians. User and role-specific customization to accommodate differences in workflow and information needs may increase the adoption and use of HIE. Conclusion Understanding end users' perspectives towards HIE technology is crucial to the long-term success of HIE. By applying qualitative methods, an in-depth understanding of HIE usage was developed. PMID:22003156
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.
Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A
2005-04-07
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Provenance-Powered Automatic Workflow Generation and Composition
NASA Astrophysics Data System (ADS)
Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.
2015-12-01
In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
Support for Taverna workflows in the VPH-Share cloud platform.
Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F
2017-07-01
To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adachi, Satoshi; Toda, Mikito; Kubotani, Hiroto
The fixed-trace ensemble of random complex matrices is the fundamental model that excellently describes the entanglement in the quantum states realized in a coupled system by its strongly chaotic dynamical evolution [see H. Kubotani, S. Adachi, M. Toda, Phys. Rev. Lett. 100 (2008) 240501]. The fixed-trace ensemble fully takes into account the conservation of probability for quantum states. The present paper derives for the first time the exact analytical formula of the one-body distribution function of singular values of random complex matrices in the fixed-trace ensemble. The distribution function of singular values (i.e. Schmidt eigenvalues) of a quantum state ismore » so important since it describes characteristics of the entanglement in the state. The derivation of the exact analytical formula utilizes two recent achievements in mathematics, which appeared in 1990s. The first is the Kaneko theory that extends the famous Selberg integral by inserting a hypergeometric type weight factor into the integrand to obtain an analytical formula for the extended integral. The second is the Petkovsek-Wilf-Zeilberger theory that calculates definite hypergeometric sums in a closed form.« less
The New Worlds Observer: The Astrophysics Strategic Mission Concept Study
2009-08-01
of galaxies and galaxy clusters • Tracing the cosmic evolution of dark energy • Mapping the distribution of dark matter • Characterization of the...imaging of these fields will be used to map the distribution of dark matter us- ing the distortions of galaxy images produced by weak gravitational...dedicated to specific science goals such as mapping dark matter , tracing dark energy, or prob- ing star formation in the local Universe. In the dif
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-11-28
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-03-01
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Use of portable X-ray fluorescence spectroscopy and geostatistics for health risk assessment.
Yang, Meng; Wang, Cheng; Yang, Zhao-Ping; Yan, Nan; Li, Feng-Ying; Diao, Yi-Wei; Chen, Min-Dong; Li, Hui-Ming; Wang, Jin-Hua; Qian, Xin
2018-05-30
Laboratory analysis of trace metals using inductively coupled plasma (ICP) spectroscopy is not cost effective, and the complex spatial distribution of soil trace metals makes their spatial analysis and prediction problematic. Thus, for the health risk assessment of exposure to trace metals in soils, portable X-ray fluorescence (PXRF) spectroscopy was used to replace ICP spectroscopy for metal analysis, and robust geostatistical methods were used to identify spatial outliers in trace metal concentrations and to map trace metal distributions. A case study was carried out around an industrial area in Nanjing, China. The results showed that PXRF spectroscopy provided results for trace metal (Cu, Ni, Pb and Zn) levels comparable to ICP spectroscopy. The results of the health risk assessment showed that Ni posed a higher non-carcinogenic risk than Cu, Pb and Zn, indicating a higher priority of concern than the other elements. Sampling locations associated with adverse health effects were identified as 'hotspots', and high-risk areas were delineated from risk maps. These 'hotspots' and high-risk areas were in close proximity to and downwind from petrochemical plants, indicating the dominant role of industrial activities as the major sources of trace metals in soils. The approach used in this study could be adopted as a cost-effective methodology for screening 'hotspots' and priority areas of concern for cost-efficient health risk management. Copyright © 2018 Elsevier Inc. All rights reserved.
A Model of Workflow Composition for Emergency Management
NASA Astrophysics Data System (ADS)
Xin, Chen; Bin-ge, Cui; Feng, Zhang; Xue-hui, Xu; Shan-shan, Fu
The common-used workflow technology is not flexible enough in dealing with concurrent emergency situations. The paper proposes a novel model for defining emergency plans, in which workflow segments appear as a constituent part. A formal abstraction, which contains four operations, is defined to compose workflow segments under constraint rule. The software system of the business process resources construction and composition is implemented and integrated into Emergency Plan Management Application System.
A Framework for Modeling Workflow Execution by an Interdisciplinary Healthcare Team.
Kezadri-Hamiaz, Mounira; Rosu, Daniela; Wilk, Szymon; Kuziemsky, Craig; Michalowski, Wojtek; Carrier, Marc
2015-01-01
The use of business workflow models in healthcare is limited because of insufficient capture of complexities associated with behavior of interdisciplinary healthcare teams that execute healthcare workflows. In this paper we present a novel framework that builds on the well-founded business workflow model formalism and related infrastructures and introduces a formal semantic layer that describes selected aspects of team dynamics and supports their real-time operationalization.
Evolution of user analysis on the grid in ATLAS
NASA Astrophysics Data System (ADS)
Dewhurst, A.; Legger, F.; ATLAS Collaboration
2017-10-01
More than one thousand physicists analyse data collected by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN through 150 computing facilities around the world. Efficient distributed analysis requires optimal resource usage and the interplay of several factors: robust grid and software infrastructures, and system capability to adapt to different workloads. The continuous automatic validation of grid sites and the user support provided by a dedicated team of expert shifters have been proven to provide a solid distributed analysis system for ATLAS users. Typical user workflows on the grid, and their associated metrics, are discussed. Measurements of user job performance and typical requirements are also shown.
Design and implementation of a secure workflow system based on PKI/PMI
NASA Astrophysics Data System (ADS)
Yan, Kai; Jiang, Chao-hui
2013-03-01
As the traditional workflow system in privilege management has the following weaknesses: low privilege management efficiency, overburdened for administrator, lack of trust authority etc. A secure workflow model based on PKI/PMI is proposed after studying security requirements of the workflow systems in-depth. This model can achieve static and dynamic authorization after verifying user's ID through PKC and validating user's privilege information by using AC in workflow system. Practice shows that this system can meet the security requirements of WfMS. Moreover, it can not only improve system security, but also ensures integrity, confidentiality, availability and non-repudiation of the data in the system.
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...
2017-09-29
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
NASA Astrophysics Data System (ADS)
Gleason, J. L.; Hillyer, T. N.; Wilkins, J.
2012-12-01
The CERES Science Team integrates data from 5 CERES instruments onboard the Terra, Aqua and NPP missions. The processing chain fuses CERES observations with data from 19 other unique sources. The addition of CERES Flight Model 5 (FM5) onboard NPP, coupled with ground processing system upgrades further emphasizes the need for an automated job-submission utility to manage multiple processing streams concurrently. The operator-driven, legacy-processing approach relied on manually staging data from magnetic tape to limited spinning disk attached to a shared memory architecture system. The migration of CERES production code to a distributed, cluster computing environment with approximately one petabyte of spinning disk containing all precursor input data products facilitates the development of a CERES-specific, automated workflow manager. In the cluster environment, I/O is the primary system resource in contention across jobs. Therefore, system load can be maximized with a throttling workload manager. This poster discusses a Java and Perl implementation of an automated job management tool tailored for CERES processing.
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
Process improvement methodologies uncover unexpected gaps in stroke care.
Kuner, Anthony D; Schemmel, Andrew J; Pooler, B Dustin; Yu, John-Paul J
2018-01-01
Background The diagnosis and treatment of acute stroke requires timed and coordinated effort across multiple clinical teams. Purpose To analyze the frequency and temporal distribution of emergent stroke evaluations (ESEs) to identify potential contributory workflow factors that may delay the initiation and subsequent evaluation of emergency department stroke patients. Material and Methods A total of 719 sentinel ESEs with concurrent neuroimaging were identified over a 22-month retrospective time period. Frequency data were tabulated and odds ratios calculated. Results Of all ESEs, 5% occur between 01:00 and 07:00. ESEs were most frequent during the late morning and early afternoon hours (10:00-14:00). Unexpectedly, there was a statistically significant decline in the frequency of ESEs that occur at the 14:00 time point. Conclusion Temporal analysis of ESEs in the emergency department allowed us to identify an unexpected decrease in ESEs and through process improvement methodologies (Lean and Six Sigma) and identify potential workflow elements contributing to this observation.
Li, Rui; Bing, Haijian; Wu, Yanhong; Zhou, Jun; Xiang, Zhongxiang
2018-02-01
The aim of this study is to reveal the effects of regional human activity on trace metal accumulation in remote alpine ecosystems under long-distance atmospheric transport. Trace metals (Cd, Pb, and Zn) in soils of the Mt. Luoji, Southwest China, were investigated along a large altitudinal gradient [2200-3850 m above sea level (a.s.l.)] to elaborate the key factors controlling their distribution by Pb isotopic composition and statistical models. The concentrations of Cd, Pb, and Zn in the surface soils (O and A horizons) were relatively low at the altitudes of 3500-3700 m a.s.l. The enrichment factors of trace metals in the surface soils increased with altitude. After normalization for soil organic matter, the concentrations of Cd still increased with altitude, whereas those of Pb and Zn did not show a clear altitudinal trend. The effects of vegetation and cold trapping (CTE) (pollutant enrichment by decreasing temperature with increasing altitude) mainly determined the distribution of Cd and Pb in the O horizon, whereas CTE and bedrock weathering (BW) controlled that of Zn. In the A horizon, the distribution of Cd and Pb depended on the vegetation regulation, whereas that of Zn was mainly related to BW. Human activity, including ores mining and fossil fuels combustion, increased the trace metal deposition in the surface soils. The anthropogenic percentage of Cd, Pb, and Zn quantified 92.4, 67.8, and 42.9% in the O horizon, and 74.5, 33.9, and 24.9% in the A horizon, respectively. The anthropogenic metals deposited at the high altitudes of Mt. Luoji reflected the impact of long-range atmospheric transport on this remote alpine ecosystem from southern and southwestern regions.
NASA Astrophysics Data System (ADS)
Leibovici, D. G.; Pourabdollah, A.; Jackson, M.
2011-12-01
Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Braun, Christopher L.; Wilson, Jennifer T.; Van Metre, Peter C.; Weakland, Rhonda J.; Fosness, Ryan L.; Williams, Marshall L.
2012-01-01
Fifty subsamples from 15 cores were analyzed for major and trace elements. Concentrations of trace elements were low, with respect to sediment quality guidelines, in most cores. Typically, major and trace element concentrations were lower in the subsamples collected from the Snake River compared to those collected from the Clearwater River, the confluence of the Snake and Clearwater Rivers, and Lower Granite Reservoir. Generally, lower concentrations of major and trace elements were associated with coarser sediments (larger than 0.0625 millimeter) and higher concentrations of major and trace elements were associated with finer sediments (smaller than 0.0625 millimeter).
DOMstudio: an integrated workflow for Digital Outcrop Model reconstruction and interpretation
NASA Astrophysics Data System (ADS)
Bistacchi, Andrea
2015-04-01
Different Remote Sensing technologies, including photogrammetry and LIDAR, allow collecting 3D dataset that can be used to create 3D digital representations of outcrop surfaces, called Digital Outcrop Models (DOM), or sometimes Virtual Outcrop Models (VOM). Irrespective of the Remote Sensing technique used, DOMs can be represented either by photorealistic point clouds (PC-DOM) or textured surfaces (TS-DOM). The first are datasets composed of millions of points with XYZ coordinates and RGB colour, whilst the latter are triangulated surfaces onto which images of the outcrop have been mapped or "textured" (applying a tech-nology originally developed for movies and videogames). Here we present a workflow that allows exploiting in an integrated and efficient, yet flexible way, both kinds of dataset: PC-DOMs and TS-DOMs. The workflow is composed of three main steps: (1) data collection and processing, (2) interpretation, and (3) modelling. Data collection can be performed with photogrammetry, LIDAR, or other techniques. The quality of photogrammetric datasets obtained with Structure From Motion (SFM) techniques has shown a tremendous improvement over the past few years, and this is becoming the more effective way to collect DOM datasets. The main advantages of photogrammetry over LIDAR are represented by the very simple and lightweight field equipment (a digital camera), and by the arbitrary spatial resolution, that can be increased simply getting closer to the out-crop or by using a different lens. It must be noted that concerns about the precision of close-range photogrammetric surveys, that were justified in the past, are no more a problem if modern software and acquisition schemas are applied. In any case, LIDAR is a well-tested technology and it is still very common. Irrespective of the data collection technology, the output will be a photorealistic point cloud and a collection of oriented photos, plus additional imagery in special projects (e.g. infrared images). This dataset can be used as-is (PC-DOM), or a 3D triangulated surface can be interpolated from the point cloud, and images can be used to associate a texture to this surface (TS-DOM). In the DOMstudio workflow we use both PC-DOMs and TS-DOMs. Particularly, the latter are obtained projecting the original images onto the triangulated surface, without any downsampling, thus retaining the original resolution and quality of images collected in the field. In the DOMstudio interpretation step, PC-DOM is considered the best option for fracture analysis in outcrops where facets corresponding to fractures are present. This allows obtaining orientation statistics (e.g. stereoplots, Fisher statistics, etc.) directly from a point cloud where, for each point, the unit vector normal to the outcrop surface has been calculated. A recent development in this kind of processing is represented by the possibility to automatically select (segment) subset point clouds representing single fracture surfaces, which can be used for studies on fracture length, spacing, etc., allowing to obtain parameters like the frequency-length distribution, P21, etc. PC-DOM interpretation can be combined or complemented, depending on the outcrop morphology, with an interpretation carried out on a TS-DOM in terms of traces, which are the linear intersection of "geological" surfaces (fractures, faults, bedding, etc.) with the outcrop surface. This kind of interpretation is very well suited for outcrops with smooth surfaces, and can be performed either by manual picking, or by applying image analysis techniques on the images associated with the DOM. In this case, a huge mass of data, with very high resolution, can be collected very effectively. If we consider applications like lithological or mineral map-ping, TS-DOM datasets are the only suitable support. Finally, the DOMstudio workflow produces output in formats that are compatible with all common geomodelling packages (e.g. Gocad/Skua, Petrel, Move), allowing to directly use the quantitative data collected on DOMs to generate and calibrate geological, structural, or geostatistical models. I will present examples of applications including hydrocarbon reservoir analogue studies, studies on fault zone architecture, lithological mapping on sedimentary and metamorphic rocks, and studies on the surface of planets and small bodies in the Solar System.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, J; Pompos, A; Jiang, S
Purpose: To put forth an innovative clinical paradigm for weekly chart checking so that treatment status is periodically checked accurately and efficiently. This study also aims to help optimize the chart checking clinical workflow in a busy radiation therapy clinic. Methods: It is mandated by the Texas Administrative code to check patient charts of radiation therapy once a week or every five fractions, however it varies drastically among institutions in terms of when and how it is done. Some do it every day, but a lot of efforts are wasted on opening ineligible charts; some do it on a fixedmore » day but the distribution of intervals between subsequent checks is not optimal. To establish an optimal chart checking procedure, a new paradigm was developed to achieve 1) charts are checked more accurately and more efficiently; 2) charts are checked on optimal days without any miss; 3) workload is evened out throughout a week when multiple physicists are involved. All active charts will be accessed by querying the R&V system. Priority is assigned to each chart based on the number of days before the next due date followed by sorting and workload distribution steps. New charts are also taken into account when distributing the workload so it is reasonably even throughout the week. Results: Our clinical workflow became more streamlined and smooth. In addition, charts get checked in a more timely fashion so that errors would get caught earlier should they occur. Conclusion: We developed a new weekly chart checking diagram. It helps physicists check charts in a timely manner, saves their time in busy clinics, and consequently reduces possible errors.« less
NASA Astrophysics Data System (ADS)
Huang, T.; Alarcon, C.; Quach, N. T.
2014-12-01
Capture, curate, and analysis are the typical activities performed at any given Earth Science data center. Modern data management systems must be adaptable to heterogeneous science data formats, scalable to meet the mission's quality of service requirements, and able to manage the life-cycle of any given science data product. Designing a scalable data management doesn't happen overnight. It takes countless hours of refining, refactoring, retesting, and re-architecting. The Horizon data management and workflow framework, developed at the Jet Propulsion Laboratory, is a portable, scalable, and reusable framework for developing high-performance data management and product generation workflow systems to automate data capturing, data curation, and data analysis activities. The NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC)'s Data Management and Archive System (DMAS) is its core data infrastructure that handles capturing and distribution of hundreds of thousands of satellite observations each day around the clock. DMAS is an application of the Horizon framework. The NASA Global Imagery Browse Services (GIBS) is NASA's Earth Observing System Data and Information System (EOSDIS)'s solution for making high-resolution global imageries available to the science communities. The Imagery Exchange (TIE), an application of the Horizon framework, is a core subsystem for GIBS responsible for data capturing and imagery generation automation to support the EOSDIS' 12 distributed active archive centers and 17 Science Investigator-led Processing Systems (SIPS). This presentation discusses our ongoing effort in refining, refactoring, retesting, and re-architecting the Horizon framework to enable data-intensive science and its applications.
QMachine: commodity supercomputing in web browsers
2014-01-01
Background Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics’ “Big Data” from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. Results QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running “download and install” software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. Conclusions QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments. PMID:24913605
Worklist handling in workflow-enabled radiological application systems
NASA Astrophysics Data System (ADS)
Wendler, Thomas; Meetz, Kirsten; Schmidt, Joachim; von Berg, Jens
2000-05-01
For the next generation integrated information systems for health care applications, more emphasis has to be put on systems which, by design, support the reduction of cost, the increase inefficiency and the improvement of the quality of services. A substantial contribution to this will be the modeling. optimization, automation and enactment of processes in health care institutions. One of the perceived key success factors for the system integration of processes will be the application of workflow management, with workflow management systems as key technology components. In this paper we address workflow management in radiology. We focus on an important aspect of workflow management, the generation and handling of worklists, which provide workflow participants automatically with work items that reflect tasks to be performed. The display of worklists and the functions associated with work items are the visible part for the end-users of an information system using a workflow management approach. Appropriate worklist design and implementation will influence user friendliness of a system and will largely influence work efficiency. Technically, in current imaging department information system environments (modality-PACS-RIS installations), a data-driven approach has been taken: Worklist -- if present at all -- are generated from filtered views on application data bases. In a future workflow-based approach, worklists will be generated by autonomous workflow services based on explicit process models and organizational models. This process-oriented approach will provide us with an integral view of entire health care processes or sub- processes. The paper describes the basic mechanisms of this approach and summarizes its benefits.
Powers, Christina M; Mills, Karmann A; Morris, Stephanie A; Klaessig, Fred; Gaheen, Sharon; Lewinski, Nastassja
2015-01-01
Summary There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics). This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration). Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration), while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1) the rationale and importance of a defined workflow in nanocuration, 2) the influence of organizational goals or purpose on the workflow, 3) established workflow practices in other fields, 4) current workflow practices in nanocuration, 5) key challenges for workflows in emerging fields like nanomaterials, 6) examples to make these challenges more tangible, and 7) recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano). Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups) and providing nanocuration resources (e.g., training) will likely prove crucial for the wider application of nanocuration workflows in the scientific community. PMID:26425437
Rusk, Brian; Koenig, Alan; Lowers, Heather
2011-01-01
Cathodoluminescent (CL) textures in quartz reveal successive histories of the physical and chemical fluctuations that accompany crystal growth. Such CL textures reflect trace element concentration variations that can be mapped by electron microprobe or laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS). Trace element maps in hydrothermal quartz from four different ore deposit types (Carlin-type Au, epithermal Ag, porphyry-Cu, and MVT Pb-Zn) reveal correlations among trace elements and between trace element concentrations and CL textures. The distributions of trace elements reflect variations in the physical and chemical conditions of quartz precipitation. These maps show that Al is the most abundant trace element in hydrothermal quartz. In crystals grown at temperatures below 300 °C, Al concentrations may vary by up to two orders of magnitude between adjacent growth zones, with no evidence for diffusion. The monovalent cations Li, Na, and K, where detectable, always correlate with Al, with Li being the most abundant of the three. In most samples, Al is more abundant than the combined total of the monovalent cations; however, in the MVT sample, molar Al/Li ratios are ~0.8. Antimony is present in concentrations up to ~120 ppm in epithermal quartz (~200–300 °C), but is not detectable in MVT, Carlin, or porphyry-Cu quartz. Concentrations of Sb do not correlate consistently with those of other trace elements or with CL textures. Titanium is only abundant enough to be mapped in quartz from porphyry-type ore deposits that precipitate at temperatures above ~400 °C. In such quartz, Ti concentration correlates positively with CL intensity, suggesting a causative relationship. In contrast, in quartz from other deposit types, there is no consistent correlation between concentrations of any trace element and CL intensity fluctuations.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.
Hartman, Douglas J
2015-06-01
Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2015 Elsevier Inc. All rights reserved.
Enhancing and Customizing Laboratory Information Systems to Improve/Enhance Pathologist Workflow.
Hartman, Douglas J
2016-03-01
Optimizing pathologist workflow can be difficult because it is affected by many variables. Surgical pathologists must complete many tasks that culminate in a final pathology report. Several software systems can be used to enhance/improve pathologist workflow. These include voice recognition software, pre-sign-out quality assurance, image utilization, and computerized provider order entry. Recent changes in the diagnostic coding and the more prominent role of centralized electronic health records represent potential areas for increased ways to enhance/improve the workflow for surgical pathologists. Additional unforeseen changes to the pathologist workflow may accompany the introduction of whole-slide imaging technology to the routine diagnostic work. Copyright © 2016 Elsevier Inc. All rights reserved.
Collaborative Science Using Web Services and the SciFlo Grid Dataflow Engine
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Yunck, T.
2006-12-01
The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of Web Services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-stratosphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we have developed a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo leverages remote Web Services, called via Simple Object Access Protocol (SOAP) or REST (one-line) URLs, and the Grid Computing standards (WS-* &Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable Web Services and native executables into a distributed computing flow (tree of operators). The SciFlo client &server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. In particular, SciFlo exploits the wealth of datasets accessible by OpenGIS Consortium (OGC) Web Mapping Servers & Web Coverage Servers (WMS/WCS), and by Open Data Access Protocol (OpenDAP) servers. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months of data. Recently, the Earth Science Information Partners (ESIP) Federation sponsored a collaborative activity in which several ESIP members advertised their respective WMS/WCS and SOAP services, developed some collaborative science scenarios for atmospheric and aerosol science, and then choreographed services from multiple groups into demonstration workflows using the SciFlo engine and a Business Process Execution Language (BPEL) workflow engine. For several scenarios, the same collaborative workflow was executed in three ways: using hand-coded scripts, by executing a SciFlo document, and by executing a BPEL workflow document. We will discuss the lessons learned from this activity, the need for standardized interfaces (like WMS/WCS), the difficulty in agreeing on even simple XML formats and interfaces, and further collaborations that are being pursued.
NASA Astrophysics Data System (ADS)
Bertin, Stephane; Friedrich, Heide; Delmas, Patrice; Chan, Edwin; Gimel'farb, Georgy
2015-03-01
Grain-scale monitoring of fluvial morphology is important for the evaluation of river system dynamics. Significant progress in remote sensing and computer performance allows rapid high-resolution data acquisition, however, applications in fluvial environments remain challenging. Even in a controlled environment, such as a laboratory, the extensive acquisition workflow is prone to the propagation of errors in digital elevation models (DEMs). This is valid for both of the common surface recording techniques: digital stereo photogrammetry and terrestrial laser scanning (TLS). The optimisation of the acquisition process, an effective way to reduce the occurrence of errors, is generally limited by the use of commercial software. Therefore, the removal of evident blunders during post processing is regarded as standard practice, although this may introduce new errors. This paper presents a detailed evaluation of a digital stereo-photogrammetric workflow developed for fluvial hydraulic applications. The introduced workflow is user-friendly and can be adapted to various close-range measurements: imagery is acquired with two Nikon D5100 cameras and processed using non-proprietary "on-the-job" calibration and dense scanline-based stereo matching algorithms. Novel ground truth evaluation studies were designed to identify the DEM errors, which resulted from a combination of calibration errors, inaccurate image rectifications and stereo-matching errors. To ensure optimum DEM quality, we show that systematic DEM errors must be minimised by ensuring a good distribution of control points throughout the image format during calibration. DEM quality is then largely dependent on the imagery utilised. We evaluated the open access multi-scale Retinex algorithm to facilitate the stereo matching, and quantified its influence on DEM quality. Occlusions, inherent to any roughness element, are still a major limiting factor to DEM accuracy. We show that a careful selection of the camera-to-object and baseline distance reduces errors in occluded areas and that realistic ground truths help to quantify those errors.
MBAT: a scalable informatics system for unifying digital atlasing workflows.
Lee, Daren; Ruffins, Seth; Ng, Queenie; Sane, Nikhil; Anderson, Steve; Toga, Arthur
2010-12-22
Digital atlases provide a common semantic and spatial coordinate system that can be leveraged to compare, contrast, and correlate data from disparate sources. As the quality and amount of biological data continues to advance and grow, searching, referencing, and comparing this data with a researcher's own data is essential. However, the integration process is cumbersome and time-consuming due to misaligned data, implicitly defined associations, and incompatible data sources. This work addressing these challenges by providing a unified and adaptable environment to accelerate the workflow to gather, align, and analyze the data. The MouseBIRN Atlasing Toolkit (MBAT) project was developed as a cross-platform, free open-source application that unifies and accelerates the digital atlas workflow. A tiered, plug-in architecture was designed for the neuroinformatics and genomics goals of the project to provide a modular and extensible design. MBAT provides the ability to use a single query to search and retrieve data from multiple data sources, align image data using the user's preferred registration method, composite data from multiple sources in a common space, and link relevant informatics information to the current view of the data or atlas. The workspaces leverage tool plug-ins to extend and allow future extensions of the basic workspace functionality. A wide variety of tool plug-ins were developed that integrate pre-existing as well as newly created technology into each workspace. Novel atlasing features were also developed, such as supporting multiple label sets, dynamic selection and grouping of labels, and synchronized, context-driven display of ontological data. MBAT empowers researchers to discover correlations among disparate data by providing a unified environment for bringing together distributed reference resources, a user's image data, and biological atlases into the same spatial or semantic context. Through its extensible tiered plug-in architecture, MBAT allows researchers to customize all platform components to quickly achieve personalized workflows.
Kilian, Norbert; Henning, Tilo; Plitzner, Patrick; Müller, Andreas; Güntsch, Anton; Stöver, Ben C.; Müller, Kai F.; Berendsohn, Walter G.; Borsch, Thomas
2015-01-01
We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work. Database URL: http://campanula.e-taxonomy.net/ PMID:26424081
Dawn: A Simulation Model for Evaluating Costs and Tradeoffs of Big Data Science Architectures
NASA Astrophysics Data System (ADS)
Cinquini, L.; Crichton, D. J.; Braverman, A. J.; Kyo, L.; Fuchs, T.; Turmon, M.
2014-12-01
In many scientific disciplines, scientists and data managers are bracing for an upcoming deluge of big data volumes, which will increase the size of current data archives by a factor of 10-100 times. For example, the next Climate Model Inter-comparison Project (CMIP6) will generate a global archive of model output of approximately 10-20 Peta-bytes, while the upcoming next generation of NASA decadal Earth Observing instruments are expected to collect tens of Giga-bytes/day. In radio-astronomy, the Square Kilometre Array (SKA) will collect data in the Exa-bytes/day range, of which (after reduction and processing) around 1.5 Exa-bytes/year will be stored. The effective and timely processing of these enormous data streams will require the design of new data reduction and processing algorithms, new system architectures, and new techniques for evaluating computation uncertainty. Yet at present no general software tool or framework exists that will allow system architects to model their expected data processing workflow, and determine the network, computational and storage resources needed to prepare their data for scientific analysis. In order to fill this gap, at NASA/JPL we have been developing a preliminary model named DAWN (Distributed Analytics, Workflows and Numerics) for simulating arbitrary complex workflows composed of any number of data processing and movement tasks. The model can be configured with a representation of the problem at hand (the data volumes, the processing algorithms, the available computing and network resources), and is able to evaluate tradeoffs between different possible workflows based on several estimators: overall elapsed time, separate computation and transfer times, resulting uncertainty, and others. So far, we have been applying DAWN to analyze architectural solutions for 4 different use cases from distinct science disciplines: climate science, astronomy, hydrology and a generic cloud computing use case. This talk will present preliminary results and discuss how DAWN can be evolved into a powerful tool for designing system architectures for data intensive science.
High volcanic seismic b-values: Real or artefacts?
NASA Astrophysics Data System (ADS)
Roberts, Nick; Bell, Andrew; Main, Ian G.
2015-04-01
The b-value of the Gutenberg-Richter distribution quantifies the relative proportion of large to small magnitude earthquakes in a catalogue, in turn related to the population of fault rupture areas and the average slip or stress drop. Accordingly the b-value is an important parameter to consider when evaluating seismic catalogues as it has the potential to provide insight into the temporal or spatial evolution of the system, such as fracture development or changes in the local stress regime. The b-value for tectonic seismicity is commonly found to be close to 1, whereas much higher b-values are frequently reported for volcanic and induced seismicity. Understanding these differences is important for understanding the processes controlling earthquake occurrence in different settings. However, it is possible that anomalously high b-values could arise from small sample sizes, under-estimated completeness magnitudes, or other poorly applied methodologies. Therefore, it is important to establish a rigorous workflow for analyzing these datasets. Here we examine the frequency-magnitude distributions of volcanic earthquake catalogues in order to determine the significance of apparently high b-values. We first derive a workflow for computing the completeness magnitude of a seismic catalogue, using synthetic catalogues of varying shape, size, and known b-value. We find the best approach involves a combination of three methods: 'Maximum Curvature', 'b-value stability', and the 'Goodness-of-Fit test'. To calculate a reliable b-value with an error ≤0.25, the maximum curvature method is preferred for a 'sharp-peaked' discrete distribution. For a catalogue with a broader peak the b-value stability method is the most reliable with the Goodness-of-Fit test being an acceptable backup if the b-value stability method fails. We apply this workflow to earthquake catalogues from El Hierro (2011-2013) and Mt Etna (1999-2013) volcanoes. In general, we find the b-value to be equal to or slightly greater than 1. However, reliable high b-values of 1.5-2.4 at El Hierro and 1.5-1.8 at Mt Etna are observed for restricted time periods. We argue that many of the almost axiomatically 'high' b-values reported in the literature for volcanic and induced seismicity may be attributable to biases introduced by the methods of inference used and/or the relatively small sample sizes often available.
Can high seismic b-values be explained solely by poorly applied methodology?
NASA Astrophysics Data System (ADS)
Roberts, Nick; Bell, Andrew; Main, Ian
2015-04-01
The b-value of the Gutenberg-Richter distribution quantifies the relative proportion of large to small magnitude earthquakes in a catalogue, in turn related to the population of fault rupture areas and the average slip or stress drop. Accordingly the b-value is an important parameter to consider when evaluating seismic catalogues as it has the potential to provide insight into the temporal or spatial evolution of the system, such as fracture development or changes in the local stress regime. The b-value for tectonic seismicity is commonly found to be close to 1, whereas much higher b-values are frequently reported for volcanic and induced seismicity. Understanding these differences is important for understanding the processes controlling earthquake occurrence in different settings. However, it is possible that anomalously high b-values could arise from small sample sizes, under-estimated completeness magnitudes, or other poorly applied methodologies. Therefore, it is important to establish a rigorous workflow for analyzing these datasets. Here we examine the frequency-magnitude distributions of volcanic earthquake catalogues in order to determine the significance of apparently high b-values. We first derive a workflow for computing the completeness magnitude of a seismic catalogue, using synthetic catalogues of varying shape, size, and known b-value. We find the best approach involves a combination of three methods: 'Maximum Curvature', 'b-value stability', and the 'Goodness-of-Fit test'. To calculate a reliable b-value with an error ≤0.25, the maximum curvature method is preferred for a 'sharp-peaked' discrete distribution. For a catalogue with a broader peak the b-value stability method is the most reliable with the Goodness-of-Fit test being an acceptable backup if the b-value stability method fails. We apply this workflow to earthquake catalogues from El Hierro (2011-2013) and Mt Etna (1999-2013) volcanoes. In general, we find the b-value to be equal to or slightly greater than 1, however, reliably high b-values are reported in both catalogues. We argue that many of the almost axiomatically 'high' b-values reported in the literature for volcanic and induced seismicity may be attributable to biases introduced by the methods of inference used and/or the relatively small sample sizes often available. This new methodology, although focused towards volcanic catalogues, is applicabale to all seismic catalogues.
Spatial Data Exploring by Satellite Image Distributed Processing
NASA Astrophysics Data System (ADS)
Mihon, V. D.; Colceriu, V.; Bektas, F.; Allenbach, K.; Gvilava, M.; Gorgan, D.
2012-04-01
Our society needs and environmental predictions encourage the applications development, oriented on supervising and analyzing different Earth Science related phenomena. Satellite images could be explored for discovering information concerning land cover, hydrology, air quality, and water and soil pollution. Spatial and environment related data could be acquired by imagery classification consisting of data mining throughout the multispectral bands. The process takes in account a large set of variables such as satellite image types (e.g. MODIS, Landsat), particular geographic area, soil composition, vegetation cover, and generally the context (e.g. clouds, snow, and season). All these specific and variable conditions require flexible tools and applications to support an optimal search for the appropriate solutions, and high power computation resources. The research concerns with experiments on solutions of using the flexible and visual descriptions of the satellite image processing over distributed infrastructures (e.g. Grid, Cloud, and GPU clusters). This presentation highlights the Grid based implementation of the GreenLand application. The GreenLand application development is based on simple, but powerful, notions of mathematical operators and workflows that are used in distributed and parallel executions over the Grid infrastructure. Currently it is used in three major case studies concerning with Istanbul geographical area, Rioni River in Georgia, and Black Sea catchment region. The GreenLand application offers a friendly user interface for viewing and editing workflows and operators. The description involves the basic operators provided by GRASS [1] library as well as many other image related operators supported by the ESIP platform [2]. The processing workflows are represented as directed graphs giving the user a fast and easy way to describe complex parallel algorithms, without having any prior knowledge of any programming language or application commands. Also this Web application does not require any kind of install for what the house-hold user is concerned. It is a remote application which may be accessed over the Internet. Currently the GreenLand application is available through the BSC-OS Portal provided by the enviroGRIDS FP7 project [3]. This presentation aims to highlight the challenges and issues of flexible description of the Grid based processing of satellite images, interoperability with other software platforms available in the portal, as well as the particular requirements of the Black Sea related use cases.
STSE: Spatio-Temporal Simulation Environment Dedicated to Biology.
Stoma, Szymon; Fröhlich, Martina; Gerber, Susanne; Klipp, Edda
2011-04-28
Recently, the availability of high-resolution microscopy together with the advancements in the development of biomarkers as reporters of biomolecular interactions increased the importance of imaging methods in molecular cell biology. These techniques enable the investigation of cellular characteristics like volume, size and geometry as well as volume and geometry of intracellular compartments, and the amount of existing proteins in a spatially resolved manner. Such detailed investigations opened up many new areas of research in the study of spatial, complex and dynamic cellular systems. One of the crucial challenges for the study of such systems is the design of a well stuctured and optimized workflow to provide a systematic and efficient hypothesis verification. Computer Science can efficiently address this task by providing software that facilitates handling, analysis, and evaluation of biological data to the benefit of experimenters and modelers. The Spatio-Temporal Simulation Environment (STSE) is a set of open-source tools provided to conduct spatio-temporal simulations in discrete structures based on microscopy images. The framework contains modules to digitize, represent, analyze, and mathematically model spatial distributions of biochemical species. Graphical user interface (GUI) tools provided with the software enable meshing of the simulation space based on the Voronoi concept. In addition, it supports to automatically acquire spatial information to the mesh from the images based on pixel luminosity (e.g. corresponding to molecular levels from microscopy images). STSE is freely available either as a stand-alone version or included in the linux live distribution Systems Biology Operational Software (SB.OS) and can be downloaded from http://www.stse-software.org/. The Python source code as well as a comprehensive user manual and video tutorials are also offered to the research community. We discuss main concepts of the STSE design and workflow. We demonstrate it's usefulness using the example of a signaling cascade leading to formation of a morphological gradient of Fus3 within the cytoplasm of the mating yeast cell Saccharomyces cerevisiae. STSE is an efficient and powerful novel platform, designed for computational handling and evaluation of microscopic images. It allows for an uninterrupted workflow including digitization, representation, analysis, and mathematical modeling. By providing the means to relate the simulation to the image data it allows for systematic, image driven model validation or rejection. STSE can be scripted and extended using the Python language. STSE should be considered rather as an API together with workflow guidelines and a collection of GUI tools than a stand alone application. The priority of the project is to provide an easy and intuitive way of extending and customizing software using the Python language.
NASA Astrophysics Data System (ADS)
Wohlgemuth-Ueberwasser, Cora C.; Viljoen, Fanus; Petersen, Sven; Vorster, Clarisa
2015-06-01
The key for understanding the trace metal inventory of currently explored VHMS deposits lies in the understanding of trace element distribution during the formation of these deposits on the seafloor. Recrystallization processes already occurring at the seafloor might liberate trace elements to later hydrothermal alteration and removement. To investigate the distribution and redistribution of trace elements we analyzed sulfide minerals from 27 black smoker samples derived from three different seafloor hydrothermal fields: the ultramafic-hosted Logatchev hydrothermal field on the Mid-Atlantic Ridge, the basaltic-hosted Turtle Pits field on the mid-atlantic ridge, and the felsic-hosted PACMANUS field in the Manus basin (Papua New Guinea). The sulfide samples were analyzed by mineral liberation analyser for the modal abundances of sulfide minerals, by electron microprobe for major elements and by laser ablation-inductively coupled plasma-mass spectrometry for As, Sb, Se, Te, and Au. The samples consist predominantly of chalcopyrite, sphalerite, pyrite, galena and minor isocubanite as well as inclusions of tetrahedrite-tennantite. Laser ablation spectra were used to evaluate the solubility limits of trace elements in different sulfide minerals at different textures. The solubility of As, Sb, and Au in pyrite decreases with increasing degree of recrystallization. When solubility limits are reached these elements occur as inclusions in the different sulfide phases or they are expelled from the mineral phase. Most ancient VHMS deposits represent felsic or bimodal felsic compositions. Samples from the felsic-hosted PACMANUS hydrothermal field at the Pual ridge (Papua New Guinea) show high concentrations of Pb, As, Sb, Bi, Hg, and Te, which is likely the result of an additional trace element contribution derived from magmatic volatiles. Co-precipitating pyrite and chalcopyrite are characterized by equal contents of Te, while chalcopyrite that replaced pyrite (presumably during black smoker growth) is enriched in Te relative to pyrite. These higher Te concentrations may be related to higher fluid temperature.
NASA Astrophysics Data System (ADS)
Swetnam, T. L.; Pelletier, J. D.; Merchant, N.; Callahan, N.; Lyons, E.
2015-12-01
Earth science is making rapid advances through effective utilization of large-scale data repositories such as aerial LiDAR and access to NSF-funded cyberinfrastructures (e.g. the OpenTopography.org data portal, iPlant Collaborative, and XSEDE). Scaling analysis tasks that are traditionally developed using desktops, laptops or computing clusters to effectively leverage national and regional scale cyberinfrastructure pose unique challenges and barriers to adoption. To address some of these challenges in Fall 2014 an 'Applied Cyberinfrastructure Concepts' a project-based learning course (ISTA 420/520) at the University of Arizona focused on developing scalable models of 'Effective Energy and Mass Transfer' (EEMT, MJ m-2 yr-1) for use by the NSF Critical Zone Observatories (CZO) project. EEMT is a quantitative measure of the flux of available energy to the critical zone, and its computation involves inputs that have broad applicability (e.g. solar insolation). The course comprised of 25 students with varying level of computational skills and with no prior domain background in the geosciences, collaborated with domain experts to develop the scalable workflow. The original workflow relying on open-source QGIS platform on a laptop was scaled to effectively utilize cloud environments (Openstack), UA Campus HPC systems, iRODS, and other XSEDE and OSG resources. The project utilizes public data, e.g. DEMs produced by OpenTopography.org and climate data from Daymet, which are processed using GDAL, GRASS and SAGA and the Makeflow and Work-queue task management software packages. Students were placed into collaborative groups to develop the separate aspects of the project. They were allowed to change teams, alter workflows, and design and develop novel code. The students were able to identify all necessary dependencies, recompile source onto the target execution platforms, and demonstrate a functional workflow, which was further improved upon by one of the group leaders over Spring 2015. All of the code, documentation and workflow description are currently available on GitHub and a public data portal is in development. We present a case study of how students reacted to the challenge of a real science problem, their interactions with end-users, what went right, and what could be done better in the future.
Electric Power Distribution System Model Simplification Using Segment Substitution
Reiman, Andrew P.; McDermott, Thomas E.; Akcakaya, Murat; ...
2017-09-20
Quasi-static time-series (QSTS) simulation is used to simulate the behavior of distribution systems over long periods of time (typically hours to years). The technique involves repeatedly solving the load-flow problem for a distribution system model and is useful for distributed energy resource (DER) planning. When a QSTS simulation has a small time step and a long duration, the computational burden of the simulation can be a barrier to integration into utility workflows. One way to relieve the computational burden is to simplify the system model. The segment substitution method of simplifying distribution system models introduced in this paper offers modelmore » bus reduction of up to 98% with a simplification error as low as 0.2% (0.002 pu voltage). Finally, in contrast to existing methods of distribution system model simplification, which rely on topological inspection and linearization, the segment substitution method uses black-box segment data and an assumed simplified topology.« less
Electric Power Distribution System Model Simplification Using Segment Substitution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reiman, Andrew P.; McDermott, Thomas E.; Akcakaya, Murat
Quasi-static time-series (QSTS) simulation is used to simulate the behavior of distribution systems over long periods of time (typically hours to years). The technique involves repeatedly solving the load-flow problem for a distribution system model and is useful for distributed energy resource (DER) planning. When a QSTS simulation has a small time step and a long duration, the computational burden of the simulation can be a barrier to integration into utility workflows. One way to relieve the computational burden is to simplify the system model. The segment substitution method of simplifying distribution system models introduced in this paper offers modelmore » bus reduction of up to 98% with a simplification error as low as 0.2% (0.002 pu voltage). Finally, in contrast to existing methods of distribution system model simplification, which rely on topological inspection and linearization, the segment substitution method uses black-box segment data and an assumed simplified topology.« less
Long Creek Creek Mine Drainage Study: South Fork Reservation: Final Report
To characterize water quality in streams affected by historical mining it is necessary to determine the seasonal and spatial distribution patterns of trace metals concentrations. Identification of these patterns is used to identify the trace metals that are of ecological concern ...
Distributed Trust Management for Validating SLA Choreographies
NASA Astrophysics Data System (ADS)
Haq, Irfan Ul; Alnemr, Rehab; Paschke, Adrian; Schikuta, Erich; Boley, Harold; Meinel, Christoph
For business workflow automation in a service-enriched environment such as a grid or a cloud, services scattered across heterogeneous Virtual Organizations (VOs) can be aggregated in a producer-consumer manner, building hierarchical structures of added value. In order to preserve the supply chain, the Service Level Agreements (SLAs) corresponding to the underlying choreography of services should also be incrementally aggregated. This cross-VO hierarchical SLA aggregation requires validation, for which a distributed trust system becomes a prerequisite. Elaborating our previous work on rule-based SLA validation, we propose a hybrid distributed trust model. This new model is based on Public Key Infrastructure (PKI) and reputation-based trust systems. It helps preventing SLA violations by identifying violation-prone services at service selection stage and actively contributes in breach management at the time of penalty enforcement.
Long, H.K.; Farrar, J.W.
1993-01-01
This report presents the results of the U.S. Geological Survey's analytical evaluation program for seven standard reference samples--T-123 (trace constituents), T-125 (trace constituents), M-126 (major constituents), N-38 (nutrients), N-39 (Nutrients), P-20 (precipitation-low ionic strength), and Hg-16 (mercury)--that were distributed in April 1993 to 175 laboratories registered in the U.S. Geological Survey sponsored interlaboratory testing program. Analytical data received from 131 of the laboratories were evaluated with respect to: overall laboratory performance and relative laboratory performance for each analyte in the 7 reference samples. Results of these evaluations are presented in tabular form. Also presented are tables and graphs summarizing the analytical data provided by each laboratory for each analyte in the seven standard reference samples. The most probable value for each analyte was determined using nonparametric statistics.
NASA Astrophysics Data System (ADS)
Weihusen, Andreas; Ritter, Felix; Kröger, Tim; Preusser, Tobias; Zidowitz, Stephan; Peitgen, Heinz-Otto
2007-03-01
Image guided radiofrequency (RF) ablation has taken a significant part in the clinical routine as a minimally invasive method for the treatment of focal liver malignancies. Medical imaging is used in all parts of the clinical workflow of an RF ablation, incorporating treatment planning, interventional targeting and result assessment. This paper describes a software application, which has been designed to support the RF ablation workflow under consideration of the requirements of clinical routine, such as easy user interaction and a high degree of robust and fast automatic procedures, in order to keep the physician from spending too much time at the computer. The application therefore provides a collection of specialized image processing and visualization methods for treatment planning and result assessment. The algorithms are adapted to CT as well as to MR imaging. The planning support contains semi-automatic methods for the segmentation of liver tumors and the surrounding vascular system as well as an interactive virtual positioning of RF applicators and a concluding numerical estimation of the achievable heat distribution. The assessment of the ablation result is supported by the segmentation of the coagulative necrosis and an interactive registration of pre- and post-interventional image data for the comparison of tumor and necrosis segmentation masks. An automatic quantification of surface distances is performed to verify the embedding of the tumor area into the thermal lesion area. The visualization methods support representations in the commonly used orthogonal 2D view as well as in 3D scenes.
A data management and publication workflow for a large-scale, heterogeneous sensor network.
Jones, Amber Spackman; Horsburgh, Jeffery S; Reeder, Stephanie L; Ramírez, Maurier; Caraballo, Juan
2015-06-01
It is common for hydrology researchers to collect data using in situ sensors at high frequencies, for extended durations, and with spatial distributions that produce data volumes requiring infrastructure for data storage, management, and sharing. The availability and utility of these data in addressing scientific questions related to water availability, water quality, and natural disasters relies on effective cyberinfrastructure that facilitates transformation of raw sensor data into usable data products. It also depends on the ability of researchers to share and access the data in useable formats. In this paper, we describe a data management and publication workflow and software tools for research groups and sites conducting long-term monitoring using in situ sensors. Functionality includes the ability to track monitoring equipment inventory and events related to field maintenance. Linking this information to the observational data is imperative in ensuring the quality of sensor-based data products. We present these tools in the context of a case study for the innovative Urban Transitions and Aridregion Hydrosustainability (iUTAH) sensor network. The iUTAH monitoring network includes sensors at aquatic and terrestrial sites for continuous monitoring of common meteorological variables, snow accumulation and melt, soil moisture, surface water flow, and surface water quality. We present the overall workflow we have developed for effectively transferring data from field monitoring sites to ultimate end-users and describe the software tools we have deployed for storing, managing, and sharing the sensor data. These tools are all open source and available for others to use.
Purdue ionomics information management system. An integrated functional genomics platform.
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E
2007-02-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.
2013-12-01
The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.
Hettne, Kristina M; Dharuri, Harish; Zhao, Jun; Wolstencroft, Katherine; Belhajjame, Khalid; Soiland-Reyes, Stian; Mina, Eleni; Thompson, Mark; Cruickshank, Don; Verdes-Montenegro, Lourdes; Garrido, Julian; de Roure, David; Corcho, Oscar; Klyne, Graham; van Schouwen, Reinout; 't Hoen, Peter A C; Bechhofer, Sean; Goble, Carole; Roos, Marco
2014-01-01
One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.
van der Heijden, Suzanne; de Oliveira, Susanne Juel; Kampmann, Marie-Louise; Børsting, Claus; Morling, Niels
2017-11-01
The Precision ID Identity Panel was used to type 109 Somali individuals in order to obtain allele frequencies for the Somali population. These frequencies were used to establish a Somali HID-SNP database, which will be used for the biostatistic calculations in family and immigration cases. Genotypes obtained with the Precision ID Identity Panel were found to be almost in complete concordance with genotypes obtained with the SNPforID PCR-SBE-CE assay. In seven SNP loci, silent alleles were identified, of which most were previously described in the literature. The project also set out to compare different AmpliSeq™ workflows to investigate the possibility of using automated library building in forensic genetic case work. In order to do so, the SNP typing of the Somalis was performed using three different workflows: 1) manual library building and sequencing on the Ion PGM™, 2) automated library building using the Biomek ® 3000 and sequencing on the Ion PGM™, and 3) automated library building using the Ion Chef™ and sequencing on the Ion S5™. AmpliSeq™ workflows were compared based on coverage, locus balance, noise, and heterozygote balance. Overall, the Ion Chef™/Ion S5™ workflow was found to give the best results and required least hands-on time in the laboratory. However, the Ion Chef™/Ion S5™ workflow was also the most expensive. The number of libraries that may be constructed in one Ion Chef™ library building run was limited to eight, which is too little for high throughput workflows. The Biomek ® 3000/Ion PGM™ workflow was found to perform similarly to the manual/Ion PGM™ workflow. This argues for the use of automated library building in forensic genetic case work. Automated library building decreases the workload of the laboratory staff, decreases the risk of pipetting errors, and simplifies the daily workflow in forensic genetic laboratories. Copyright © 2017 Elsevier B.V. All rights reserved.
From the desktop to the grid: scalable bioinformatics via workflow conversion.
de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver
2016-03-12
Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
ACTRIS Aerosol, Clouds and Trace Gases Research Infrastructure
NASA Astrophysics Data System (ADS)
Pappalardo, Gelsomina
2018-04-01
The Aerosols, Clouds and Trace gases Research Infrastructure (ACTRIS) is a distributed infrastructure dedicated to high-quality observation of aerosols, clouds, trace gases and exploration of their interactions. It will deliver precision data, services and procedures regarding the 4D variability of clouds, short-lived atmospheric species and the physical, optical and chemical properties of aerosols to improve the current capacity to analyse, understand and predict past, current and future evolution of the atmospheric environment.
A scientific workflow framework for (13)C metabolic flux analysis.
Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina
2016-08-20
Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Jia; Liu, Longli; Xue, Yong; Dong, Jing; Hu, Yingcui; Hill, Richard; Guang, Jie; Li, Chi
2017-01-01
Workflow for remote sensing quantitative retrieval is the ;bridge; between Grid services and Grid-enabled application of remote sensing quantitative retrieval. Workflow averts low-level implementation details of the Grid and hence enables users to focus on higher levels of application. The workflow for remote sensing quantitative retrieval plays an important role in remote sensing Grid and Cloud computing services, which can support the modelling, construction and implementation of large-scale complicated applications of remote sensing science. The validation of workflow is important in order to support the large-scale sophisticated scientific computation processes with enhanced performance and to minimize potential waste of time and resources. To research the semantic correctness of user-defined workflows, in this paper, we propose a workflow validation method based on tacit knowledge research in the remote sensing domain. We first discuss the remote sensing model and metadata. Through detailed analysis, we then discuss the method of extracting the domain tacit knowledge and expressing the knowledge with ontology. Additionally, we construct the domain ontology with Protégé. Through our experimental study, we verify the validity of this method in two ways, namely data source consistency error validation and parameters matching error validation.
BPELPower—A BPEL execution engine for geospatial web services
NASA Astrophysics Data System (ADS)
Yu, Genong (Eugene); Zhao, Peisheng; Di, Liping; Chen, Aijun; Deng, Meixia; Bai, Yuqi
2012-10-01
The Business Process Execution Language (BPEL) has become a popular choice for orchestrating and executing workflows in the Web environment. As one special kind of scientific workflow, geospatial Web processing workflows are data-intensive, deal with complex structures in data and geographic features, and execute automatically with limited human intervention. To enable the proper execution and coordination of geospatial workflows, a specially enhanced BPEL execution engine is required. BPELPower was designed, developed, and implemented as a generic BPEL execution engine with enhancements for executing geospatial workflows. The enhancements are especially in its capabilities in handling Geography Markup Language (GML) and standard geospatial Web services, such as the Web Processing Service (WPS) and the Web Feature Service (WFS). BPELPower has been used in several demonstrations over the decade. Two scenarios were discussed in detail to demonstrate the capabilities of BPELPower. That study showed a standard-compliant, Web-based approach for properly supporting geospatial processing, with the only enhancement at the implementation level. Pattern-based evaluation and performance improvement of the engine are discussed: BPELPower directly supports 22 workflow control patterns and 17 workflow data patterns. In the future, the engine will be enhanced with high performance parallel processing and broad Web paradigms.
A Scientific Workflow System for Satellite Data Processing with Real-Time Monitoring
NASA Astrophysics Data System (ADS)
Nguyen, Minh Duc
2018-02-01
This paper provides a case study on satellite data processing, storage, and distribution in the space weather domain by introducing the Satellite Data Downloading System (SDDS). The approach proposed in this paper was evaluated through real-world scenarios and addresses the challenges related to the specific field. Although SDDS is used for satellite data processing, it can potentially be adapted to a wide range of data processing scenarios in other fields of physics.
NASA Astrophysics Data System (ADS)
Zhongqin, G.; Chen, Y.
2017-12-01
Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.
Architectures Toward Reusable Science Data Systems
NASA Astrophysics Data System (ADS)
Moses, J. F.
2014-12-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building ground systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research, NOAA's weather satellites and USGS's Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience the goal is to recognize architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Architectures Toward Reusable Science Data Systems
NASA Technical Reports Server (NTRS)
Moses, John
2015-01-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research and NOAAs Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience we expect to find architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Performance of an Automated Versus a Manual Whole-Body Magnetic Resonance Imaging Workflow.
Stocker, Daniel; Finkenstaedt, Tim; Kuehn, Bernd; Nanz, Daniel; Klarhoefer, Markus; Guggenberger, Roman; Andreisek, Gustav; Kiefer, Berthold; Reiner, Caecilia S
2018-04-24
The aim of this study was to evaluate the performance of an automated workflow for whole-body magnetic resonance imaging (WB-MRI), which reduces user interaction compared with the manual WB-MRI workflow. This prospective study was approved by the local ethics committee. Twenty patients underwent WB-MRI for myopathy evaluation on a 3 T MRI scanner. Ten patients (7 women; age, 52 ± 13 years; body weight, 69.9 ± 13.3 kg; height, 173 ± 9.3 cm; body mass index, 23.2 ± 3.0) were examined with a prototypical automated WB-MRI workflow, which automatically segments the whole body, and 10 patients (6 women; age, 35.9 ± 12.4 years; body weight, 72 ± 21 kg; height, 169.2 ± 10.4 cm; body mass index, 24.9 ± 5.6) with a manual scan. Overall image quality (IQ; 5-point scale: 5, excellent; 1, poor) and coverage of the study volume were assessed by 2 readers for each sequence (coronal T2-weighted turbo inversion recovery magnitude [TIRM] and axial contrast-enhanced T1-weighted [ce-T1w] gradient dual-echo sequence). Interreader agreement was evaluated with intraclass correlation coefficients. Examination time, number of user interactions, and MR technicians' acceptance rating (1, highest; 10, lowest) was compared between both groups. Total examination time was significantly shorter for automated WB-MRI workflow versus manual WB-MRI workflow (30.0 ± 4.2 vs 41.5 ± 3.4 minutes, P < 0.0001) with significantly shorter planning time (2.5 ± 0.8 vs 14.0 ± 7.0 minutes, P < 0.0001). Planning took 8% of the total examination time with automated versus 34% with manual WB-MRI workflow (P < 0.0001). The number of user interactions with automated WB-MRI workflow was significantly lower compared with manual WB-MRI workflow (10.2 ± 4.4 vs 48.2 ± 17.2, P < 0.0001). Planning efforts were rated significantly lower by the MR technicians for the automated WB-MRI workflow than for the manual WB-MRI workflow (2.20 ± 0.92 vs 4.80 ± 2.39, respectively; P = 0.005). Overall IQ was similar between automated and manual WB-MRI workflow (TIRM: 4.00 ± 0.94 vs 3.45 ± 1.19, P = 0.264; ce-T1w: 4.20 ± 0.88 vs 4.55 ± .55, P = 0.423). Interreader agreement for overall IQ was excellent for TIRM and ce-T1w with an intraclass correlation coefficient of 0.95 (95% confidence interval, 0.86-0.98) and 0.88 (95% confidence interval, 0.70-0.95). Incomplete coverage of the thoracic compartment in the ce-T1w sequence occurred more often in the automated WB-MRI workflow (P = 0.008) for reader 2. No other significant differences in the study volume coverage were found. In conclusion, the automated WB-MRI scanner workflow showed a significant reduction of the examination time and the user interaction compared with the manual WB-MRI workflow. Image quality and the coverage of the study volume were comparable in both groups.
Carter, L.F.; Anderholm, S.K.
1997-01-01
The occurrence and distribution of contaminants in aquatic systems are major components of the National Water-Quality Assessment (NAWQA) Program. Bed-sediment samples were collected at 18 sites in the Rio Grande Valley study unit between September 1992 and March 1993 to characterize the geographic distribution of organic compounds, including chlorinated insecticides, polychlorinated biphenyls (PCB's), and other chlorinated hydrocarbons, and also trace elements. Two-millimeter-size- fraction sediment was analyzed for organic compounds and less than 63-micron-size-fraction sediment was analyzed for trace elements. Concentrations of p,p'-DDE were detected in 33 percent of the bed-sediment samples. With the exception of DDT-related compounds, no other organochlorine insecticides or polychlorinated biphenyls were detected in samples of bed sediment. Whole-body fish samples were collected at 11 of the bed- sediment sites and analyzed for organic compounds. Organic compounds were reported more frequently in samples of fish, and more types of organic compounds were found in whole-body fish samples than in bed-sediment samples. Concentrations of p,p'-DDE were detected in 91 percent of whole-body fish samples. Polychlorinated biphenyls, cis-chlordane, trans-chlordane, trans- nonachlor, and hexachlorobenzene were other organic compounds detected in whole-body samples of fish from at least one site. Because of the extent of mineralized areas in the Rio Grande Basin arsenic, cadmium, copper, lead, mercury, selenium, and zinc concentrations in bed-sediment samples could represent natural conditions at most sites. However, a combination of natural conditions and human activities appears to be associated with elevated trace-element concentrations in the bed-sediment sample from the site Rio Grande near Creede, Colorado, because this sample exceeded the background trace-element concentrations calculated for this study. Fish-liver samples were collected at 12 of the bed-sediment sites and analyzed for trace elements. Certain trace elements were detected at higher concentrations in fish-liver samples than in bed-sediment samples from the same site. Both bed-sediment and fish-tissue samples are necessary for a complete environmental assessment of the occurrence and distribution of trace elements.
Trace element distribution in the rat cerebellum
NASA Astrophysics Data System (ADS)
Kwiatek, W. M.; Long, G. J.; Pounds, J. G.; Reuhl, K. R.; Hanson, A. L.; Jones, K. W.
1990-04-01
Spatial distributions and concentrations of trace elements (TE) in the brain are important because TE perform catalytic and structural functions in enzymes which regulate brain function and development. We have investigated the distributions of TE in rat cerebellum. Structures were sectioned and analyzed by the Synchrotron Radiation Induced X-ray Emission (SRIXE) method using the NSLS X-26 white-light microprobe facility. Advantages important for TE analysis of biological specimens with X-ray microscopy include short time of measurement, high brightness and flux, good spatial resolution, multielemental detection, good sensitivity, and nondestructive irradiation. Trace elements were measured in thin rat brain sections of 20 μm thickness. The analyses were performed on sample volumes as small as 0.2 nl with Minimum Detectable Limits (MDL) of 50 ppb wet weight for Fe, 100 ppb wet weight for Cu, and Zn, and 1 ppm wet weight for Pb. The distribution of TE in the molecular cell layer, granule cell layer and fiber tract of rat cerebella was investigated. Both point analyses and two-dimensional semiquantitative mapping of the TE distribution in a section were used. All analyzed elements were observed in each structure of the cerebellum except mercury which was not observed in granule cell layer or fiber tract. This approach permits an exacting correlation of the TE distribution in complex structure with the diet, toxic elements, and functional status of the animal.
NASA Astrophysics Data System (ADS)
Koelmel, Jeremy P.; Kroeger, Nicholas M.; Gill, Emily L.; Ulmer, Candice Z.; Bowden, John A.; Patterson, Rainey E.; Yost, Richard A.; Garrett, Timothy J.
2017-05-01
Untargeted omics analyses aim to comprehensively characterize biomolecules within a biological system. Changes in the presence or quantity of these biomolecules can indicate important biological perturbations, such as those caused by disease. With current technological advancements, the entire genome can now be sequenced; however, in the burgeoning fields of lipidomics, only a subset of lipids can be identified. The recent emergence of high resolution tandem mass spectrometry (HR-MS/MS), in combination with ultra-high performance liquid chromatography, has resulted in an increased coverage of the lipidome. Nevertheless, identifications from MS/MS are generally limited by the number of precursors that can be selected for fragmentation during chromatographic elution. Therefore, we developed the software IE-Omics to automate iterative exclusion (IE), where selected precursors using data-dependent topN analyses are excluded in sequential injections. In each sequential injection, unique precursors are fragmented until HR-MS/MS spectra of all ions above a user-defined intensity threshold are acquired. IE-Omics was applied to lipidomic analyses in Red Cross plasma and substantia nigra tissue. Coverage of the lipidome was drastically improved using IE. When applying IE-Omics to Red Cross plasma and substantia nigra lipid extracts in positive ion mode, 69% and 40% more molecular identifications were obtained, respectively. In addition, applying IE-Omics to a lipidomics workflow increased the coverage of trace species, including odd-chained and short-chained diacylglycerides and oxidized lipid species. By increasing the coverage of the lipidome, applying IE to a lipidomics workflow increases the probability of finding biomarkers and provides additional information for determining etiology of disease.
2012-01-01
We propose a tripartite biochemical mechanism for memory. Three physiologic components are involved, namely, the neuron (individual and circuit), the surrounding neural extracellular matrix, and the various trace metals distributed within the matrix. The binding of a metal cation affects a corresponding nanostructure (shrinking, twisting, expansion) and dielectric sensibility of the chelating node (address) within the matrix lattice, sensed by the neuron. The neural extracellular matrix serves as an electro-elastic lattice, wherein neurons manipulate multiple trace metals (n > 10) to encode, store, and decode coginive information. The proposed mechanism explains brains low energy requirements and high rates of storage capacity described in multiples of Avogadro number (NA = 6 × 1023). Supportive evidence correlates memory loss to trace metal toxicity or deficiency, or breakdown in the delivery/transport of metals to the matrix, or its degradation. Inherited diseases revolving around dysfunctional trace metal metabolism and memory dysfunction, include Alzheimer's disease (Al, Zn, Fe), Wilson’s disease (Cu), thalassemia (Fe), and autism (metallothionein). The tripartite mechanism points to the electro-elastic interactions of neurons with trace metals distributed within the neural extracellular matrix, as the molecular underpinning of “synaptic plasticity” affecting short-term memory, long-term memory, and forgetting. PMID:23050060
Bing, Haijian; Wu, Yanhong; Zhou, Jun; Li, Rui; Luo, Ji; Yu, Dong
2016-04-07
Trace metals adsorbed onto fine particles can be transported long distances and ultimately deposited in Polar Regions via the cold condensation effect. This study indicated the possible sources of silver (Ag), cadmium (Cd), copper (Cu), lead (Pb), antimony (Sb) and zinc (Zn) in soils on the eastern slope of Mt. Gongga, eastern Tibetan Plateau, and deciphered the effects of vegetation and mountain cold condensation on their distributions with elevation. The metal concentrations in the soils were comparable to other mountains worldwide except the remarkably high concentrations of Cd. Trace metals with high enrichment in the soils were influenced from anthropogenic contributions. Spatially, the concentrations of Cu and Zn in the surface horizons decreased from 2000 to 3700 m a.s.l., and then increased with elevation, whereas other metals were notably enriched in the mid-elevation area (approximately 3000 m a.s.l.). After normalization for soil organic carbon, high concentrations of Cd, Pb, Sb and Zn were observed above the timberline. Our results indicated the importance of vegetation in trace metal accumulation in an alpine ecosystem and highlighted the mountain cold trapping effect on trace metal deposition sourced from long-range atmospheric transport.
Srichandan, Suchismita; Panigrahy, R C; Baliarsingh, S K; Rao B, Srinivasa; Pati, Premalata; Sahu, Biraja K; Sahu, K C
2016-10-15
Concentrations of trace metals such as iron (Fe), copper (Cu), zinc (Zn), cobalt (Co), nickel (Ni), manganese (Mn), lead (Pb), cadmium (Cd), chromium (Cr), arsenic (As), vanadium (V), and selenium (Se) were determined in seawater and zooplankton from the surface waters off Rushikulya estuary, north-western Bay of Bengal. During the study period, the concentration of trace metals in seawater and zooplankton showed significant spatio-temporal variation. Cu and Co levels in seawater mostly remained non-detectable. Other elements were found at higher concentrations and exhibited marked variations. The rank order distribution of trace metals in terms of their average concentration in seawater was observed as Fe>Ni>Mn>Pb>As>Zn>Cr>V>Se>Cd while in zooplankton it was Fe>Mn>Cd>As>Pb>Ni>Cr>Zn>V>Se. The bioaccumulation factor (BAF) of Fe was highest followed by Zn and the lowest value was observed with Ni. Results of correlation analysis discerned positive affinity and good relationship among the majority of the trace metals, both in seawater and zooplankton suggesting their strong affinity and coexistence. Copyright © 2016 Elsevier Ltd. All rights reserved.
Integrated workflows for spiking neuronal network simulations
Antolík, Ján; Davison, Andrew P.
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A
2017-12-01
Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.
Integrated workflows for spiking neuronal network simulations.
Antolík, Ján; Davison, Andrew P
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Efficient calculation of luminance variation of a luminaire that uses LED light sources
NASA Astrophysics Data System (ADS)
Goldstein, Peter
2007-09-01
Many luminaires have an array of LEDs that illuminate a lenslet-array diffuser in order to create the appearance of a single, extended source with a smooth luminance distribution. Designing such a system is challenging because luminance calculations for a lenslet array generally involve tracing millions of rays per LED, which is computationally intensive and time-consuming. This paper presents a technique for calculating an on-axis luminance distribution by tracing only one ray per LED per lenslet. A multiple-LED system is simulated with this method, and with Monte Carlo ray-tracing software for comparison. Accuracy improves, and computation time decreases by at least five orders of magnitude with this technique, which has applications in LED-based signage, displays, and general illumination.
Text mining meets workflow: linking U-Compare with Taverna
Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia
2010-01-01
Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
Trace-fossil and storm-deposit relationships of San Carlos formation, west Texas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Metz, C.L.; Bednarski, S.P.
1986-05-01
Two distinct assemblages of trace fossils are preserved in the storm deposits in delta-front facies of the Upper Cretaceous San Carlos Formation, west Texas. The assemblages represent two widely differing responses to storm deposition and sediment-trace-fossil relationships, indicating that other environmental parameters, probably water depth and oxygen levels, influenced trace-fossil distribution within the San Carlos delta front. Evidence of the storm-deposited nature of the sandstones includes a scoured basal contact, planar to hummocky cross-stratification, and a upper contact that is either ripple marked or is gradational with overlying shales.
The distributed production system of the SuperB project: description and results
NASA Astrophysics Data System (ADS)
Brown, D.; Corvo, M.; Di Simone, A.; Fella, A.; Luppi, E.; Paoloni, E.; Stroili, R.; Tomassetti, L.
2011-12-01
The SuperB experiment needs large samples of MonteCarlo simulated events in order to finalize the detector design and to estimate the data analysis performances. The requirements are beyond the capabilities of a single computing farm, so a distributed production model capable of exploiting the existing HEP worldwide distributed computing infrastructure is needed. In this paper we describe the set of tools that have been developed to manage the production of the required simulated events. The production of events follows three main phases: distribution of input data files to the remote site Storage Elements (SE); job submission, via SuperB GANGA interface, to all available remote sites; output files transfer to CNAF repository. The job workflow includes procedures for consistency checking, monitoring, data handling and bookkeeping. A replication mechanism allows storing the job output on the local site SE. Results from 2010 official productions are reported.
wft4galaxy: a workflow testing tool for galaxy.
Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi
2017-12-01
Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics
Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe
2015-01-01
Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.
Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe
2015-05-01
The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
A practical workflow for making anatomical atlases for biological research.
Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles
2012-01-01
The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
de Bruin, Jeroen S; Adlassnig, Klaus-Peter; Leitich, Harald; Rappelsberger, Andrea
2018-01-01
Evidence-based clinical guidelines have a major positive effect on the physician's decision-making process. Computer-executable clinical guidelines allow for automated guideline marshalling during a clinical diagnostic process, thus improving the decision-making process. Implementation of a digital clinical guideline for the prevention of mother-to-child transmission of hepatitis B as a computerized workflow, thereby separating business logic from medical knowledge and decision-making. We used the Business Process Model and Notation language system Activiti for business logic and workflow modeling. Medical decision-making was performed by an Arden-Syntax-based medical rule engine, which is part of the ARDENSUITE software. We succeeded in creating an electronic clinical workflow for the prevention of mother-to-child transmission of hepatitis B, where institution-specific medical decision-making processes could be adapted without modifying the workflow business logic. Separation of business logic and medical decision-making results in more easily reusable electronic clinical workflows.
CamBAfx: Workflow Design, Implementation and Application for Neuroimaging
Ooi, Cinly; Bullmore, Edward T.; Wink, Alle-Meije; Sendur, Levent; Barnes, Anna; Achard, Sophie; Aspden, John; Abbott, Sanja; Yue, Shigang; Kitzbichler, Manfred; Meunier, David; Maxim, Voichita; Salvador, Raymond; Henty, Julian; Tait, Roger; Subramaniam, Naresh; Suckling, John
2009-01-01
CamBAfx is a workflow application designed for both researchers who use workflows to process data (consumers) and those who design them (designers). It provides a front-end (user interface) optimized for data processing designed in a way familiar to consumers. The back-end uses a pipeline model to represent workflows since this is a common and useful metaphor used by designers and is easy to manipulate compared to other representations like programming scripts. As an Eclipse Rich Client Platform application, CamBAfx's pipelines and functions can be bundled with the software or downloaded post-installation. The user interface contains all the workflow facilities expected by consumers. Using the Eclipse Extension Mechanism designers are encouraged to customize CamBAfx for their own pipelines. CamBAfx wraps a workflow facility around neuroinformatics software without modification. CamBAfx's design, licensing and Eclipse Branding Mechanism allow it to be used as the user interface for other software, facilitating exchange of innovative computational tools between originating labs. PMID:19826470
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...
2015-07-14
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-05-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Improving adherence to the Epic Beacon ambulatory workflow.
Chackunkal, Ellen; Dhanapal Vogel, Vishnuprabha; Grycki, Meredith; Kostoff, Diana
2017-06-01
Computerized physician order entry has been shown to significantly improve chemotherapy safety by reducing the number of prescribing errors. Epic's Beacon Oncology Information System of computerized physician order entry and electronic medication administration was implemented in Henry Ford Health System's ambulatory oncology infusion centers on 9 November 2013. Since that time, compliance to the infusion workflow had not been assessed. The objective of this study was to optimize the current workflow and improve the compliance to this workflow in the ambulatory oncology setting. This study was a retrospective, quasi-experimental study which analyzed the composite workflow compliance rate of patient encounters from 9 to 23 November 2014. Based on this analysis, an intervention was identified and implemented in February 2015 to improve workflow compliance. The primary endpoint was to compare the composite compliance rate to the Beacon workflow before and after a pharmacy-initiated intervention. The intervention, which was education of infusion center staff, was initiated by ambulatory-based, oncology pharmacists and implemented by a multi-disciplinary team of pharmacists and nurses. The composite compliance rate was then reassessed for patient encounters from 2 to 13 March 2015 in order to analyze the effects of the determined intervention on compliance. The initial analysis in November 2014 revealed a composite compliance rate of 38%, and data analysis after the intervention revealed a statistically significant increase in the composite compliance rate to 83% ( p < 0.001). This study supports a pharmacist-initiated educational intervention can improve compliance to an ambulatory, oncology infusion workflow.
Exploring Dental Providers’ Workflow in an Electronic Dental Record Environment
Schwei, Kelsey M; Cooper, Ryan; Mahnke, Andrea N.; Ye, Zhan
2016-01-01
Summary Background A workflow is defined as a predefined set of work steps and partial ordering of these steps in any environment to achieve the expected outcome. Few studies have investigated the workflow of providers in a dental office. It is important to understand the interaction of dental providers with the existing technologies at point of care to assess breakdown in the workflow which could contribute to better technology designs. Objective The study objective was to assess electronic dental record (EDR) workflows using time and motion methodology in order to identify breakdowns and opportunities for process improvement. Methods A time and motion methodology was used to study the human-computer interaction and workflow of dental providers with an EDR in four dental centers at a large healthcare organization. A data collection tool was developed to capture the workflow of dental providers and staff while they interacted with an EDR during initial, planned, and emergency patient visits, and at the front desk. Qualitative and quantitative analysis was conducted on the observational data. Results Breakdowns in workflow were identified while posting charges, viewing radiographs, e-prescribing, and interacting with patient scheduler. EDR interaction time was significantly different between dentists and dental assistants (6:20 min vs. 10:57 min, p = 0.013) and between dentists and dental hygienists (6:20 min vs. 9:36 min, p = 0.003). Conclusions On average, a dentist spent far less time than dental assistants and dental hygienists in data recording within the EDR. PMID:27437058
Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.
Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher
2012-12-01
As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.
NASA Astrophysics Data System (ADS)
Okyay, U.; Glennie, C. L.; Khan, S.
2017-12-01
Owing to the advent of terrestrial laser scanners (TLS), high-density point cloud data has become increasingly available to the geoscience research community. Research groups have started producing their own point clouds for various applications, gradually shifting their emphasis from obtaining the data towards extracting more and meaningful information from the point clouds. Extracting fracture properties from three-dimensional data in a (semi-)automated manner has been an active area of research in geosciences. Several studies have developed various processing algorithms for extracting only planar surfaces. In comparison, (semi-)automated identification of fracture traces at the outcrop scale, which could be used for mapping fracture distribution have not been investigated frequently. Understanding the spatial distribution and configuration of natural fractures is of particular importance, as they directly influence fluid-flow through the host rock. Surface roughness, typically defined as the deviation of a natural surface from a reference datum, has become an important metric in geoscience research, especially with the increasing density and accuracy of point clouds. In the study presented herein, a surface roughness model was employed to identify fracture traces and their distribution on an ophiolite outcrop in Oman. Surface roughness calculations were performed using orthogonal distance regression over various grid intervals. The results demonstrated that surface roughness could identify outcrop-scale fracture traces from which fracture distribution and density maps can be generated. However, considering outcrop conditions and properties and the purpose of the application, the definition of an adequate grid interval for surface roughness model and selection of threshold values for distribution maps are not straightforward and require user intervention and interpretation.
NASA Astrophysics Data System (ADS)
Iwakoshi, Takehisa; Hirota, Osamu
2014-10-01
This study will test an interpretation in quantum key distribution (QKD) that trace distance between the distributed quantum state and the ideal mixed state is a maximum failure probability of the protocol. Around 2004, this interpretation was proposed and standardized to satisfy both of the key uniformity in the context of universal composability and operational meaning of the failure probability of the key extraction. However, this proposal has not been verified concretely yet for many years while H. P. Yuen and O. Hirota have thrown doubt on this interpretation since 2009. To ascertain this interpretation, a physical random number generator was employed to evaluate key uniformity in QKD. In this way, we calculated statistical distance which correspond to trace distance in quantum theory after a quantum measurement is done, then we compared it with the failure probability whether universal composability was obtained. As a result, the degree of statistical distance of the probability distribution of the physical random numbers and the ideal uniformity was very large. It is also explained why trace distance is not suitable to guarantee the security in QKD from the view point of quantum binary decision theory.
78 FR 22880 - Agency Information Collection Activities; Proposed Collection; Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-17
... between Health IT and Ambulatory Care Workflow Redesign.'' In accordance with the Paperwork Reduction Act... Understand the Relationship between Health IT and Ambulatory Care Workflow Redesign. The Agency for... Methods to Better Understand the Relationship between Health IT and Ambulatory Care Workflow Redesign...
NASA Technical Reports Server (NTRS)
Keitz, E. L.
1978-01-01
Contained in this volume is material of a supportive nature not considered to be of sufficient importance to be included in the other two previous volumes. This material is of two types:(1) information and numerical evaluations used in the development of mission evaluations for stratospheric trace constituent measurement;and (2) various spatial and temporal distributions for those stratospheric trace species having sufficient measurements available to warrant their presentation.
Particulate Trace Element Cycling in a Diatom Bloom at Station ALOHA
NASA Astrophysics Data System (ADS)
Weisend, R.; Morton, P. L.; Landing, W. M.; Fitzsimmons, J. N.; Hayes, C. T.; Boyle, E. A.
2014-12-01
Phytoplankton in oligotrophic marine deserts depend on remote sources to supply trace nutrients. To examine these sources, marine particulate matter samples from the central North Pacific (Station ALOHA) were collected during the July-August 2012 HOE-DYLAN cruises and analyzed for a suite of trace (e.g., Fe, Mn) and major (e.g. Al, P) elements. Daily surface SPM samples were examined for evidence of atmospheric deposition and biological uptake, while five vertical profiles were examined for evidence of surface vertical export and subsurface horizontal transport from nearby sources (e.g., margin sediments, hydrothermal plumes). Maxima in surface particulate P (a biological tracer) corresponded with a diatom bloom, and surprisingly also coincided with maxima in particulate Al (typically a tracer for lithogenic inputs). The surface particulate Al distributions likely result from the adsorption of dissolved Al onto diatom silica frustules, not from atmospheric dust deposition. In addition, a subsurface maximum in particulate Al and P was observed four days later at 75m, possibly resulting from vertical export of the surface diatom bloom. The distributions of other bioactive trace elements (e.g. Cd, Co, Cu) will be presented in the context of the diatom bloom and other biological, chemical and physical features. A second, complementary poster is also being presented which examines the cycling of trace elements in lithogenic particles (Morton et al., "Trace Element Cycling in Lithogenic Particles at Station ALOHA").
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This report contains papers on the following topics: NREN Security Issues: Policies and Technologies; Layer Wars: Protect the Internet with Network Layer Security; Electronic Commission Management; Workflow 2000 - Electronic Document Authorization in Practice; Security Issues of a UNIX PEM Implementation; Implementing Privacy Enhanced Mail on VMS; Distributed Public Key Certificate Management; Protecting the Integrity of Privacy-enhanced Electronic Mail; Practical Authorization in Large Heterogeneous Distributed Systems; Security Issues in the Truffles File System; Issues surrounding the use of Cryptographic Algorithms and Smart Card Applications; Smart Card Augmentation of Kerberos; and An Overview of the Advanced Smart Card Access Control System.more » Selected papers were processed separately for inclusion in the Energy Science and Technology Database.« less
Imaging trace element distributions in single organelles and subcellular features
NASA Astrophysics Data System (ADS)
Kashiv, Yoav; Austin, Jotham R.; Lai, Barry; Rose, Volker; Vogt, Stefan; El-Muayed, Malek
2016-02-01
The distributions of chemical elements within cells are of prime importance in a wide range of basic and applied biochemical research. An example is the role of the subcellular Zn distribution in Zn homeostasis in insulin producing pancreatic beta cells and the development of type 2 diabetes mellitus. We combined transmission electron microscopy with micro- and nano-synchrotron X-ray fluorescence to image unequivocally for the first time, to the best of our knowledge, the natural elemental distributions, including those of trace elements, in single organelles and other subcellular features. Detected elements include Cl, K, Ca, Co, Ni, Cu, Zn and Cd (which some cells were supplemented with). Cell samples were prepared by a technique that minimally affects the natural elemental concentrations and distributions, and without using fluorescent indicators. It could likely be applied to all cell types and provide new biochemical insights at the single organelle level not available from organelle population level studies.
Partial transpose of random quantum states: Exact formulas and meanders
NASA Astrophysics Data System (ADS)
Fukuda, Motohisa; Śniady, Piotr
2013-04-01
We investigate the asymptotic behavior of the empirical eigenvalues distribution of the partial transpose of a random quantum state. The limiting distribution was previously investigated via Wishart random matrices indirectly (by approximating the matrix of trace 1 by the Wishart matrix of random trace) and shown to be the semicircular distribution or the free difference of two free Poisson distributions, depending on how dimensions of the concerned spaces grow. Our use of Wishart matrices gives exact combinatorial formulas for the moments of the partial transpose of the random state. We find three natural asymptotic regimes in terms of geodesics on the permutation groups. Two of them correspond to the above two cases; the third one turns out to be a new matrix model for the meander polynomials. Moreover, we prove the convergence to the semicircular distribution together with its extreme eigenvalues under weaker assumptions, and show large deviation bound for the latter.
Spectral Bayesian Knowledge Tracing
ERIC Educational Resources Information Center
Falakmasir, Mohammad; Yudelson, Michael; Ritter, Steve; Koedinger, Ken
2015-01-01
Bayesian Knowledge Tracing (BKT) has been in wide use for modeling student skill acquisition in Intelligent Tutoring Systems (ITS). BKT tracks and updates student's latent mastery of a skill as a probability distribution of a binary variable. BKT does so by accounting for observed student successes in applying the skill correctly, where success is…
Military Interoperable Digital Hospital Testbed (MIDHT)
2015-12-01
and Analyze the resulting technological impact on medication errors, pharmacists ’ productivity, nurse satisfactions/workflow and patient...medication errors, pharmacists productivity, nurse satisfactions/workflow and patient satisfaction. 1.1.1 Pharmacy Robotics Implementation...1.2 Research and analyze the resulting technological impact on medication errors, pharmacist productivity, nurse satisfaction/workflow and patient
Context-aware workflow management of mobile health applications.
Salden, Alfons; Poortinga, Remco
2006-01-01
We propose a medical application management architecture that allows medical (IT) experts readily designing, developing and deploying context-aware mobile health (m-health) applications or services. In particular, we elaborate on how our application workflow management architecture enables chaining, coordinating, composing, and adapting context-sensitive medical application components such that critical Quality of Service (QoS) and Quality of Context (QoC) requirements typical for m-health applications or services can be met. This functional architectural support requires learning modules for distilling application-critical selection of attention and anticipation models. These models will help medical experts constructing and adjusting on-the-fly m-health application workflows and workflow strategies. We illustrate our context-aware workflow management paradigm for a m-health data delivery problem, in which optimal communication network configurations have to be determined.
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...
2016-10-06
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Workflow technology: the new frontier. How to overcome the barriers and join the future.
Shefter, Susan M
2006-01-01
Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.
An Integrated Framework for Parameter-based Optimization of Scientific Workflows.
Kumar, Vijay S; Sadayappan, P; Mehta, Gaurang; Vahi, Karan; Deelman, Ewa; Ratnakar, Varun; Kim, Jihie; Gil, Yolanda; Hall, Mary; Kurc, Tahsin; Saltz, Joel
2009-01-01
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.
Workflow and Electronic Health Records in Small Medical Practices
Ramaiah, Mala; Subrahmanian, Eswaran; Sriram, Ram D; Lide, Bettijoyce B
2012-01-01
This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities. PMID:22737096
Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M
2016-01-01
Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.
NASA Astrophysics Data System (ADS)
Srinivasan, V.; Yiwen, X.; Ellis, A.; Christensen, A.; Borkiewic, K.; Cox, D.; Hart, J.; Long, S.; Marshall-Colon, A.
2016-12-01
The distribution of absorbed solar radiation in the photosynthetically active region wavelength (PAR) within plant canopies plays a critical role in determining photosynthetic carbon uptake and its associated transpiration. The vertical distribution of leaf area, leaf angles, leaf absorptivity and reflectivity within the canopy, affect the distribution of PAR absorbed throughout the canopy. While the upper canopy sunlit leaves absorb most of the incoming PAR and hence contribute most towards total canopy carbon uptake, the lower canopy shaded leaves which receive mostly lower intensity diffuse PAR make significant contributions towards plant carbon uptake. Most detailed vegetation models use a 1-D vertical multi-layer approach to model the sunlight and shaded canopy leaf fractions, and quantify the direct and diffuse radiation absorbed by the respective leaf fractions. However, this approach is only applicable under canopy closure conditions, and furthermore it fails to accurately capture the effects of diurnally varying leaf angle distributions in some plant canopies. Here, we show by using a 3-D ray tracing model which uses an explicit 3-D canopy structure that enforces no conditions about canopy closure, that the effects of diurnal variation of canopy leaf angle distributions better match with observed data. Our comparative analysis performed on soybean crop canopies between 3-D ray tracing model and the multi-layer model shows that the distribution of absorbed direct PAR is not exponential while, the distribution of absorbed diffuse PAR radiation within plant canopies is exponential. These results show the multi-layer model to significantly over-predict canopy PAR absorbed, and in turn significantly overestimate photosynthetic carbon uptake by up to 13% and canopy transpiration by 7% under mid-day sun conditions as verified through our canopy chamber experiments. Our results indicate that current detailed 1-D multi-layer canopy radiation attenuation models significantly over predict canopy radiation absorption and its associated canopy photosynthetic and transpiration fluxes, and use of a 3-D ray tracing model provides more realistic predictions of leaf canopy integrated fluxes of carbon and water.
He, Mei; Ke, Cai-Huan; Wang, Wen-Xiong
2010-03-24
In current human health risk assessment, the maximum acceptable concentrations of contaminants in food are mostly based on the total concentrations. However, the total concentration of contaminants may not always reflect the available amount. Bioaccessibility determination is thus required to improve the risk assessment of contaminants. This study used an in vitro digestion model to assess the bioaccessibility of several trace elements (As, Cd, Cu, Fe, Se, and Zn) in the muscles of two farmed marine fish species (seabass Lateolabrax japonicus and red seabream Pagrosomus major ) of different body sizes. The total concentrations and subcellular distributions of these trace elements in fish muscles were also determined. Bioaccessibility of these trace elements was generally high (>45%), and the lowest bioaccessibility was observed for Fe. Cooking processes, including boiling, steaming, frying, and grilling, generally decreased the bioaccessibility of these trace elements, especially for Cu and Zn. The influences of frying and grilling were greater than those of boiling and steaming. The relationship of bioaccessibility and total concentration varied with the elements. A positive correlation was found for As and Cu and a negative correlation for Fe, whereas no correlation was found for Cd, Se, and Zn. A significant positive relationship was demonstrated between the bioaccessibility and the elemental partitioning in the heat stable protein fraction and in the trophically available fraction, and a negative correlation was observed between the bioaccessibility and the elemental partitioning in metal-rich granule fraction. Subcellular distribution may thus affect the bioaccessibility of metals and should be considered in the risk assessment for seafood safety.
El-Mufleh, Amelène; Béchet, Béatrice; Basile-Doelsch, Isabelle; Geffroy-Rodier, Claude; Gaudin, Anne; Ruban, Véronique
2014-01-01
Sediment management from stormwater infiltration basins represents a real environmental and economic issue for stakeholders due to the pollution load and important tonnages of these by-products. To reduce the sediment volumes to treat, organic and metal micropollutant-bearing phases should be identified. A combination of density fractionation procedure and microanalysis techniques was used to evaluate the distribution of polycyclic aromatic hydrocarbons (PAHs) and trace metals (Cd, Cr, Cu, Ni, Pb, and Zn) within variable density fractions for three urban stormwater basin sediments. The results confirm that PAHs are found in the lightest fractions (d < 1.9, 1.9 < d < 2.3 g cm(-3)) whereas trace metals are equally distributed within the light, intermediary, and highest fractions (d < 1.9, 1.9 < d < 2.3, 2.3 < d < 2.6, and d > 2.8 g cm(-3)) and are mostly in the 2.3 < d < 2.6 g cm(-3) fraction. The characterization of the five fractions by global analyses and microanalysis techniques (XRD and MEB-EDX) allowed us to identify pollutant-bearing phases. PAHs are bound to the organic matter (OM) and trace metals to OM, clays, carbonates and dense particles. Moreover, the microanalysis study underlines that OM is the main constituent responsible for the aggregation, particularly for microaggregation. In terms of sediment management, it was shown that density fractionation is not suitable for trace metals but could be adapted to separate PAH-enriched phases.
Texas Solar Collaboration Action Plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Winland, Chris
2013-02-14
Texas Solar Collaboration Permitting and Interconenction Process Improvement Action Plan. San Antonio-specific; Investigate feasibility of using electronic signatures; Investigate feasibility of enabling other online permitting processes (e.g., commercial); Assess need for future document management and workflow/notification IT improvements; Update Information Bulletin 153 regarding City requirements and processes for PV; Educate contractors and public on CPS Energy’s new 2013 solar program processes; Continue to discuss “downtown grid” interconnection issues and identify potential solutions; Consider renaming Distributed Energy Resources (DER); and Continue to participate in collaborative actions.
NASA Astrophysics Data System (ADS)
Gell, Guenther
2000-05-01
1971/72 began the implementation of a computerized radiological documentation system as the Department of Radiology of the University of Graz, which developed over the years into a full RIS. 1985 started a scientific cooperation with SIEMENS to develop a PACS. The two systems were linked and evolved into a highly integrated RIS-PACS for the state wide hospital system in Styria. During its lifetime the RIS, originally implemented in FORTRAN on a UNIVAC 494 mainframe migrated to a PDP15, on to a PDP11, then VAX and Alphas. The flexible original record structure with variable length fields and the powerful retrieval language were retained. The data acquisition part with the user interface was rewritten several times and many service programs have been added. During our PACS cooperation many ideas like the folder concept or functionalities of the GUI have been designed and tested and were then implemented in the SIENET product. The actual RIS/PACS supports the whole workflow in the Radiology Department. It is installed in a 2.300 bed university hospital and the smaller hospitals of the State of Styria. Modalities from different vendors are connected via DICOM to the RIS (modality worklist) and to the PACS. PACSubsystems from other vendors have been integrated. Images are distributed to referring clinics and for teleconsultation and image processing and reports are available on line to all connected hospitals. We spent great efforts to guarantee optimal support of the workflow and to ensure an enhanced cost/benefit ratio for each user (class). Another special feature is selective image distribution. Using the high level retrieval language individual filters can be constructed easily to implement any image distribution policy agreed upon by radiologists and referring clinicians.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, J; Wang, J; Peng, J
Purpose: To implement an entire workflow quality assurance (QA) process in the radiotherapy department and to reduce the error rates of radiotherapy based on the entire workflow management in the developing country. Methods: The entire workflow QA process management starts from patient registration to the end of last treatment including all steps through the entire radiotherapy process. Error rate of chartcheck is used to evaluate the the entire workflow QA process. Two to three qualified senior medical physicists checked the documents before the first treatment fraction of every patient. Random check of the treatment history during treatment was also performed.more » A total of around 6000 patients treatment data before and after implementing the entire workflow QA process were compared from May, 2014 to December, 2015. Results: A systemic checklist was established. It mainly includes patient’s registration, treatment plan QA, information exporting to OIS(Oncology Information System), documents of treatment QAand QA of the treatment history. The error rate derived from the chart check decreases from 1.7% to 0.9% after our the entire workflow QA process. All checked errors before the first treatment fraction were corrected as soon as oncologist re-confirmed them and reinforce staff training was accordingly followed to prevent those errors. Conclusion: The entire workflow QA process improved the safety, quality of radiotherapy in our department and we consider that our QA experience can be applicable for the heavily-loaded radiotherapy departments in developing country.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu
Graphical abstract: - Highlights: • An automated workflow to optimize force-field parameters. • Used the workflow to optimize force-field parameter for a system containing nanodiamond and tRNA. • The mechanism relies on molecular dynamics simulation and neutron scattering experimental data. • The workflow can be generalized to any other experimental and simulation techniques. - Abstract: Large-scale simulations and data analysis are often required to explain neutron scattering experiments to establish a connection between the fundamental physics at the nanoscale and data probed by neutrons. However, to perform simulations at experimental conditions it is critical to use correct force-field (FF) parametersmore » which are unfortunately not available for most complex experimental systems. In this work, we have developed a workflow optimization technique to provide optimized FF parameters by comparing molecular dynamics (MD) to neutron scattering data. We describe the workflow in detail by using an example system consisting of tRNA and hydrophilic nanodiamonds in a deuterated water (D{sub 2}O) environment. Quasi-elastic neutron scattering (QENS) data show a faster motion of the tRNA in the presence of nanodiamond than without the ND. To compare the QENS and MD results quantitatively, a proper choice of FF parameters is necessary. We use an efficient workflow to optimize the FF parameters between the hydrophilic nanodiamond and water by comparing to the QENS data. Our results show that we can obtain accurate FF parameters by using this technique. The workflow can be generalized to other types of neutron data for FF optimization, such as vibrational spectroscopy and spin echo.« less
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreher, M.; Peterka, T.
Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
Structured recording of intraoperative surgical workflows
NASA Astrophysics Data System (ADS)
Neumuth, T.; Durstewitz, N.; Fischer, M.; Strauss, G.; Dietz, A.; Meixensberger, J.; Jannin, P.; Cleary, K.; Lemke, H. U.; Burgert, O.
2006-03-01
Surgical Workflows are used for the methodical and scientific analysis of surgical interventions. The approach described here is a step towards developing surgical assist systems based on Surgical Workflows and integrated control systems for the operating room of the future. This paper describes concepts and technologies for the acquisition of Surgical Workflows by monitoring surgical interventions and their presentation. Establishing systems which support the Surgical Workflow in operating rooms requires a multi-staged development process beginning with the description of these workflows. A formalized description of surgical interventions is needed to create a Surgical Workflow. This description can be used to analyze and evaluate surgical interventions in detail. We discuss the subdivision of surgical interventions into work steps regarding different levels of granularity and propose a recording scheme for the acquisition of manual surgical work steps from running interventions. To support the recording process during the intervention, we introduce a new software architecture. Core of the architecture is our Surgical Workflow editor that is intended to deal with the manifold, complex and concurrent relations during an intervention. Furthermore, a method for an automatic generation of graphs is shown which is able to display the recorded surgical work steps of the interventions. Finally we conclude with considerations about extensions of our recording scheme to close the gap to S-PACS systems. The approach was used to record 83 surgical interventions from 6 intervention types from 3 different surgical disciplines: ENT surgery, neurosurgery and interventional radiology. The interventions were recorded at the University Hospital Leipzig, Germany and at the Georgetown University Hospital, Washington, D.C., USA.
Mulware, Stephen Juma
2015-01-01
The properties of many biological materials often depend on the spatial distribution and concentration of the trace elements present in a matrix. Scientists have over the years tried various techniques including classical physical and chemical analyzing techniques each with relative level of accuracy. However, with the development of spatially sensitive submicron beams, the nuclear microprobe techniques using focused proton beams for the elemental analysis of biological materials have yielded significant success. In this paper, the basic principles of the commonly used microprobe techniques of STIM, RBS, and PIXE for trace elemental analysis are discussed. The details for sample preparation, the detection, and data collection and analysis are discussed. Finally, an application of the techniques to analysis of corn roots for elemental distribution and concentration is presented.
Quantifying Bell nonlocality with the trace distance
NASA Astrophysics Data System (ADS)
Brito, S. G. A.; Amaral, B.; Chaves, R.
2018-02-01
Measurements performed on distant parts of an entangled quantum state can generate correlations incompatible with classical theories respecting the assumption of local causality. This is the phenomenon known as quantum nonlocality that, apart from its fundamental role, can also be put to practical use in applications such as cryptography and distributed computing. Clearly, developing ways of quantifying nonlocality is an important primitive in this scenario. Here, we propose to quantify the nonlocality of a given probability distribution via its trace distance to the set of classical correlations. We show that this measure is a monotone under the free operations of a resource theory and, furthermore, that it can be computed efficiently with a linear program. We put our framework to use in a variety of relevant Bell scenarios also comparing the trace distance to other standard measures in the literature.
Cukrov, Neven; Cmuk, Petra; Mlakar, Marina; Omanović, Dario
2008-08-01
The spatial distribution of dissolved and total trace metals (Zn, Cd, Pb and Cu) in the Krka River (partly located in the Krka National Park) has been studied using a "clean" sampling, handling and analysis technique. Differential pulse anodic stripping voltammetry (DPASV) with a hanging mercury drop electrode (HMDE) has been used for trace metal analysis. The Krka River has been divided into the upper and lower flow region with respect to the metals concentration and main physico-chemical parameters. A significant increase in trace metal concentration as the result of the untreated waste water discharge downstream of Knin Town has been registered in the upper flow region. Due to a specific characteristic of the Krka, the so-called self-purification process, a decrease in the elevated trace metals concentration from the water column takes place at numerous small lakes formed by tufa barriers (at the end of the upper flow region). The clean groundwater input at the beginning of the lower flow region additionally contributes to the observed decrease in trace metals concentration in the Krka, maintaining them at a very low level in the remaining region of fresh-water flow. The determined median total concentrations were zinc 120-7400 ng l(-1), cadmium 3-8 ng l(-1), lead 11-250 ng l(-1) and copper 110-440 ng l(-1). Karst rivers, such as the Krka River, with extremely low natural concentrations of trace metals are highly sensitive to the anthropogenic influence. Therefore, such aquatic systems require implementation of strict protection regimes in the entire catchments area.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows (Invited)
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Lynnes, C.
2009-12-01
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues’ expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable “software appliance” to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish “talkoot” (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a “science story” in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using “service casts” and “interest casts” (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH’s Mining Workflow Composer and the open-source Active BPEL engine, and JPL’s SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the “sociological” problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
NASA Astrophysics Data System (ADS)
Rajchl, Martin; Abhari, Kamyar; Stirrat, John; Ukwatta, Eranga; Cantor, Diego; Li, Feng P.; Peters, Terry M.; White, James A.
2014-03-01
Multi-center trials provide the unique ability to investigate novel techniques across a range of geographical sites with sufficient statistical power, the inclusion of multiple operators determining feasibility under a wider array of clinical environments and work-flows. For this purpose, we introduce a new means of distributing pre-procedural cardiac models for image-guided interventions across a large scale multi-center trial. In this method, a single core facility is responsible for image processing, employing a novel web-based interface for model visualization and distribution. The requirements for such an interface, being WebGL-based, are minimal and well within the realms of accessibility for participating centers. We then demonstrate the accuracy of our approach using a single-center pacemaker lead implantation trial with generic planning models.
Semiautomatic mapping of permafrost in the Yukon Flats, Alaska
NASA Astrophysics Data System (ADS)
Gulbrandsen, Mats Lundh; Minsley, Burke J.; Ball, Lyndsay B.; Hansen, Thomas Mejer
2016-12-01
Thawing of permafrost due to global warming can have major impacts on hydrogeological processes, climate feedback, arctic ecology, and local environments. To understand these effects and processes, it is crucial to know the distribution of permafrost. In this study we exploit the fact that airborne electromagnetic (AEM) data are sensitive to the distribution of permafrost and demonstrate how the distribution of permafrost in the Yukon Flats, Alaska, is mapped in an efficient (semiautomatic) way, using a combination of supervised and unsupervised (machine) learning algorithms, i.e., Smart Interpretation and K-means clustering. Clustering is used to sort unfrozen and frozen regions, and Smart Interpretation is used to predict the depth of permafrost based on expert interpretations. This workflow allows, for the first time, a quantitative and objective approach to efficiently map permafrost based on large amounts of AEM data.
Semiautomatic mapping of permafrost in the Yukon Flats, Alaska
Gulbrandsen, Mats Lundh; Minsley, Burke J.; Ball, Lyndsay B.; Hansen, Thomas Mejer
2016-01-01
Thawing of permafrost due to global warming can have major impacts on hydrogeological processes, climate feedback, arctic ecology, and local environments. To understand these effects and processes, it is crucial to know the distribution of permafrost. In this study we exploit the fact that airborne electromagnetic (AEM) data are sensitive to the distribution of permafrost and demonstrate how the distribution of permafrost in the Yukon Flats, Alaska, is mapped in an efficient (semiautomatic) way, using a combination of supervised and unsupervised (machine) learning algorithms, i.e., Smart Interpretation and K-means clustering. Clustering is used to sort unfrozen and frozen regions, and Smart Interpretation is used to predict the depth of permafrost based on expert interpretations. This workflow allows, for the first time, a quantitative and objective approach to efficiently map permafrost based on large amounts of AEM data.
Miniature modified Faraday cup for micro electron beams
Teruya, Alan T.; Elmer, John W.; Palmer, Todd A.; Walton, Chris C.
2008-05-27
A micro beam Faraday cup assembly includes a refractory metal layer with an odd number of thin, radially positioned traces in this refractory metal layer. Some of the radially positioned traces are located at the edge of the micro modified Faraday cup body and some of the radially positioned traces are located in the central portion of the micro modified Faraday cup body. Each set of traces is connected to a separate data acquisition channel to form multiple independent diagnostic networks. The data obtained from the two diagnostic networks are combined and inputted into a computed tomography algorithm to reconstruct the beam shape, size, and power density distribution.
Miller, Ronald L.; McPherson, Benjamin F.
2001-01-01
Trace elements and organic contaminants in bottom-sediment samples collected from 10 sites on the Barron River Canal and from one site on the Turner River in October 1998 had patterns of distribution that indicated different sources. At some sites on the Barron River Canal, lead, copper, and zinc, normalized to aluminum, exceeded limits normally considered as background and may be enriched by human activities. Polynuclear aromatic hydrocarbons and p-cresol, normalized against organic carbon, had patterns of distribution that indicated local sources of input from a road or vehicular traffic or from an old creosote wood treatment facility. Phthalate esters and the traces elements arsenic, cadmium, and zinc were more widely distributed with the highest normalized concentrations occurring at the Turner River background site, probably due to the high percentage of fine sediment (74% less than 63 micrometers) and high organic carbon concentration (42%) at that site and the binding effect of organic carbon on trace elements and trace organic compounds. Low concentrations of pesticides or pesticide degradation products were detected in bottom sediment (DDD and DDE, each less than 3.5 µg/kg) and water (9 pesticides, each less than 0.06 µ/L), primarily in the northern reach of the Barron River Canal where agriculture is a likely source. Although a few contaminants approached criteria that would indicate adverse effects on aquatic life, none exceeded the criteria, but the potential synergistic effects of mixtures of contaminants found at most sites are not included in the criteria.
New Approaches to Quantifying Transport Model Error in Atmospheric CO2 Simulations
NASA Technical Reports Server (NTRS)
Ott, L.; Pawson, S.; Zhu, Z.; Nielsen, J. E.; Collatz, G. J.; Gregg, W. W.
2012-01-01
In recent years, much progress has been made in observing CO2 distributions from space. However, the use of these observations to infer source/sink distributions in inversion studies continues to be complicated by difficulty in quantifying atmospheric transport model errors. We will present results from several different experiments designed to quantify different aspects of transport error using the Goddard Earth Observing System, Version 5 (GEOS-5) Atmospheric General Circulation Model (AGCM). In the first set of experiments, an ensemble of simulations is constructed using perturbations to parameters in the model s moist physics and turbulence parameterizations that control sub-grid scale transport of trace gases. Analysis of the ensemble spread and scales of temporal and spatial variability among the simulations allows insight into how parameterized, small-scale transport processes influence simulated CO2 distributions. In the second set of experiments, atmospheric tracers representing model error are constructed using observation minus analysis statistics from NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA). The goal of these simulations is to understand how errors in large scale dynamics are distributed, and how they propagate in space and time, affecting trace gas distributions. These simulations will also be compared to results from NASA's Carbon Monitoring System Flux Pilot Project that quantified the impact of uncertainty in satellite constrained CO2 flux estimates on atmospheric mixing ratios to assess the major factors governing uncertainty in global and regional trace gas distributions.
NASA Astrophysics Data System (ADS)
Sandler, A.; Brenner, I. B.; Halicz, L.
1988-02-01
Waters of the northern watershed of Lake Kineret, sampled during the period 1978 1983, were analyzed for their major and trace element contents. The trace element concentrations of the major water sources of the watershed (the Dan and Banias springs) represent background values. After emergence, the waters are subjected to human activity. In crossing the populated and cultivated Hula Basin in man-made canals, the major and trace element contents increase. In comparison to the trace element concentrations, those of the major elements have narrow ranges and small temporal fluctuations. Trace element concentrations varied by 3 orders of magnitude, and temporal variations were large but not neccessarily seasonal. Point sources of trace elements were urban effluents, fish pond wastes, and peat soil drainage. The trace element concentrations decrease in the waters of the last segment of the Jordan River. All measured trace elements were below the criteria levels established by regulatory agencies. Several, however, were of the same order of magnitude. Addition of wastes from enhanced recycling, and morphologic modification of the final course of the Jordan River could result in increase in the trace element concentrations in the water.
COSMOS: Python library for massively parallel workflows
Gafni, Erik; Luquette, Lovelace J.; Lancaster, Alex K.; Hawkins, Jared B.; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter J.
2014-01-01
Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24982428
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.
Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617
Using Kepler for Tool Integration in Microarray Analysis Workflows.
Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C
Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.
COSMOS: Python library for massively parallel workflows.
Gafni, Erik; Luquette, Lovelace J; Lancaster, Alex K; Hawkins, Jared B; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P; Tonellato, Peter J
2014-10-15
Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Niazkhani, Zahra; Pirnejad, Habibollah; Berg, Marc; Aarts, Jos
2009-01-01
Previous studies have shown the importance of workflow issues in the implementation of CPOE systems and patient safety practices. To understand the impact of CPOE on clinical workflow, we developed a conceptual framework and conducted a literature search for CPOE evaluations between 1990 and June 2007. Fifty-one publications were identified that disclosed mixed effects of CPOE systems. Among the frequently reported workflow advantages were the legible orders, remote accessibility of the systems, and the shorter order turnaround times. Among the frequently reported disadvantages were the time-consuming and problematic user-system interactions, and the enforcement of a predefined relationship between clinical tasks and between providers. Regarding the diversity of findings in the literature, we conclude that more multi-method research is needed to explore CPOE's multidimensional and collective impact on especially collaborative workflow.
Load-sensitive dynamic workflow re-orchestration and optimisation for faster patient healthcare.
Meli, Christopher L; Khalil, Ibrahim; Tari, Zahir
2014-01-01
Hospital waiting times are considerably long, with no signs of reducing any-time soon. A number of factors including population growth, the ageing population and a lack of new infrastructure are expected to further exacerbate waiting times in the near future. In this work, we show how healthcare services can be modelled as queueing nodes, together with healthcare service workflows, such that these workflows can be optimised during execution in order to reduce patient waiting times. Services such as X-ray, computer tomography, and magnetic resonance imaging often form queues, thus, by taking into account the waiting times of each service, the workflow can be re-orchestrated and optimised. Experimental results indicate average waiting time reductions are achievable by optimising workflows using dynamic re-orchestration. Crown Copyright © 2013. Published by Elsevier Ireland Ltd. All rights reserved.
Scientific workflows as productivity tools for drug discovery.
Shon, John; Ohkawa, Hitomi; Hammer, Juergen
2008-05-01
Large pharmaceutical companies annually invest tens to hundreds of millions of US dollars in research informatics to support their early drug discovery processes. Traditionally, most of these investments are designed to increase the efficiency of drug discovery. The introduction of do-it-yourself scientific workflow platforms has enabled research informatics organizations to shift their efforts toward scientific innovation, ultimately resulting in a possible increase in return on their investments. Unlike the handling of most scientific data and application integration approaches, researchers apply scientific workflows to in silico experimentation and exploration, leading to scientific discoveries that lie beyond automation and integration. This review highlights some key requirements for scientific workflow environments in the pharmaceutical industry that are necessary for increasing research productivity. Examples of the application of scientific workflows in research and a summary of recent platform advances are also provided.
Participatory Action Research and Its Meanings: Vivencia, Praxis, Conscientization
ERIC Educational Resources Information Center
Glassman, Michael; Erdem, Gizem
2014-01-01
This article traces the development of the "second" and arguably more well-known "genre" of participatory action research (PAR). The article argues that the origins of PAR are highly distributed and cannot really be traced back to the ideas of a single person or even a single group of researchers. Instead, the development of…
Bockman, R S; Repo, M A; Warrell, R P; Pounds, J G; Schidlovsky, G; Gordon, B M; Jones, K W
1990-01-01
Gallium nitrate, a drug that inhibits calcium release from bone, has been proven a safe and effective treatment for the accelerated bone resorption associated with cancer. Though bone is a target organ for gallium, the kinetics, sites, and effects of gallium accumulation in bone are not known. We have used synchrotron x-ray microscopy to map the distribution of trace levels of gallium in bone. After short-term in vivo administration of gallium nitrate to rats, trace (nanogram) amounts of gallium preferentially localized to the metabolically active regions in the metaphysis as well as the endosteal and periosteal surfaces of diaphyseal bone, regions where new bone formation and modeling were occurring. The amounts measured were well below the levels known to be cytotoxic. Iron and zinc, trace elements normally found in bone, were decreased in amount after in vivo administration of gallium. These studies represent a first step toward understanding the mechanism(s) of action of gallium in bone by suggesting the possible cellular, structural, and elemental "targets" of gallium. Images PMID:2349224
Farrar, Jerry W.
1999-01-01
This report presents the results of the U.S. Geological Survey's analytical evaluation program for seven standard reference samples -- T-155 (trace constituents), M-148 (major constituents), N-59 (nutrient constituents), N-60 (nutrient constituents), P-31 (low ionic strength constituents), GWT-4 (ground-water trace constituents), and Hg- 27 (mercury) -- which were distributed in September 1998 to 162 laboratories enrolled in the U.S. Geological Survey sponsored interlaboratory testing program. Analytical data that were received from 136 of the laboratories were evaluated with respect to overall laboratory performance and relative laboratory performance for each analyte in the seven reference samples. Results of these evaluations are presented in tabular form. Also presented are tables and graphs summarizing the analytical data provided by each laboratory for each analyte in the seven standard reference samples. The most probable value for each analyte was determined using nonparametric statistics.
Long, H. Keith; Farrar, Jerry W.
1995-01-01
This report presents the results of the U.S. Geological Survey's analytical evaluation program for 7 standard reference samples--T-131 (trace constituents), T-133 (trace constituents), M-132 (major constituents), N-43 (nutrients), N-44 (nutrients), P-23 (low ionic strength), and Hg-19 (mercury). The samples were distributed in October 1994 to 131 laboratories registered in the U.S. Geological Survey sponsored interlaboratory testing program. Analytical data that were received from 121 of the laboratories were evaluated with respect to: overall laboratory performance and relative laboratory performance for each analyte in the seven reference samples. Results of these evaluations are presented in tabular form. Also presented are tables and graphs summarizing the analytical data provided by each laboratory for each analyte in the seven standard reference samples. The most probable value for each analyte was determined using nonparametric statistics.
NASA Astrophysics Data System (ADS)
Frins, Erna; Bobrowski, Nicole; Platt, Ulrich; Wagner, Thomas
2006-08-01
A novel experimental procedure to measure the near-surface distribution of atmospheric trace gases by using passive multiaxis differential absorption optical spectroscopy (MAX-DOAS) is proposed. The procedure consists of pointing the receiving telescope of the spectrometer to nonreflecting surfaces or to bright targets placed at known distances from the measuring device, which are illuminated by sunlight. We show that the partial trace gas absorptions between the top of the atmosphere and the target can be easily removed from the measured total absorption. Thus it is possible to derive the average concentration of trace gases such as NO2, HCHO, SO2, H2O, Glyoxal, BrO, and others along the line of sight between the instrument and the target similar to the well-known long-path DOAS observations (but with much less expense). If tomographic arrangements are used, even two- or three-dimensional trace gas distributions can be retrieved. The basic assumptions of the proposed method are confirmed by test measurements taken across the city of Heidelberg.
Schlesinger, Joseph J; Burdick, Kendall; Baum, Sarah; Bellomy, Melissa; Mueller, Dorothee; MacDonald, Alistair; Chern, Alex; Chrouser, Kristin; Burger, Christie
2018-03-01
The concept of clinical workflow borrows from management and leadership principles outside of medicine. The only way to rethink clinical workflow is to understand the neuroscience principles that underlie attention and vigilance. With any implementation to improve practice, there are human factors that can promote or impede progress. Modulating the environment and working as a team to take care of patients is paramount. Clinicians must continually rethink clinical workflow, evaluate progress, and understand that other industries have something to offer. Then, novel approaches can be implemented to take the best care of patients. Copyright © 2017 Elsevier Inc. All rights reserved.
Microbial Influences on Trace Metal Cycling in a Meromictic Lake, Fayetteville Green Lake, NY
NASA Astrophysics Data System (ADS)
Zerkle, A. L.; House, C.; Kump, L.
2002-12-01
Microorganisms can exist in aquatic environments at very high cell densities of up to 1011 cells/L, and can accumulate significant quantities of trace metals. Bacteria actively take up bioactive trace metals, including Fe, Zn, Mn, Co, Ni, Cu, and Mo, which function as catalytic centers in metalloproteins and metal-activated enzymes involved in virtually all cellular functions. In addition, bacteria may catalyze the release of trace metals from inorganic substrates by processes such as the reduction of iron and manganese oxides, suggesting that trace metal distributions within a natural environment dominated by microbial processes may be controlled primarily by microbial ecology. Fayetteville Green Lake (FGL), NY, is a permanently stratified meromictic lake that has a well-oxygenated surface water mass (mixolimnion) overlying a relatively stagnant, anoxic deep water mass (monimolimnion). A chemocline separates the water masses at around 20m depth, where oxygen concentrations decrease and sulfate and methane concentrations increase. In addition, previous studies have indicated that trace metals such as V, Cr, Co, Mn, and Fe reach elevated concentrations at the chemocline. Using fluorescent in situ hybridization (FISH) of FGL samples from depths of up to 40m with bacterial and archaeal probes, we have shown that fluctuating redox conditions within the FGL water column correlate with significant variations in the composition and distribution of microbial populations with depth. The mixolimnion is dominated by Eubacteria, with increasing concentrations of Archaea in the lower anoxic zone. Increases in microbial cell densities coincide with increases in trace metals at the chemocline, suggesting microbial activity may be responsible for trace metal release at this boundary. 16S rRNA PCR cloning techniques are currently being used to identify dominant microbial populations at various levels within the FGL water column. Future studies will focus on the potential for these dominant microorganisms to influence trace metal cycling and bioavailability in the FGL water column.
Raghu, V
2013-12-01
Biogeochemical characteristics of the cattle are dealt based on the observations made in Ayurveda in the light of modern scientific developments in applied environmental geochemistry. The biogeochemical characteristics of certain important ecological components and animal products of the stall-fed animals were studied. For this purpose, a dairy farm of Tirumala-Tirupati Devasthanams, a religious organization in Tirupati, Chittoor District, Andhra Pradesh was selected. This study is intended to trace out the trace element interactions in the ecological components (soil, water, fodder, feed) of the stall-fed animals and their output components viz. dung, urine and milk. Physical, physico-chemical properties and certain trace elements were determined for composite samples of ecological components and dung, urine, and milk of stall-fed animals. The variations in the distribution of pH and EC of urine and milk reflect the variations in their physico-chemical or hydro-chemical properties. As mentioned in Ayurveda, not only the properties of milk but also the properties of dung and urine reflect their diet and conditions of their habitat. Even though the diet is the same, the cows of different breeds yield milk of variable physical, physico-chemical properties and trace element composition which can be attributed to their body colour, substantiating Ayurveda.
Bing, Haijian; Wu, Yanhong; Zhou, Jun; Li, Rui; Luo, Ji; Yu, Dong
2016-01-01
Trace metals adsorbed onto fine particles can be transported long distances and ultimately deposited in Polar Regions via the cold condensation effect. This study indicated the possible sources of silver (Ag), cadmium (Cd), copper (Cu), lead (Pb), antimony (Sb) and zinc (Zn) in soils on the eastern slope of Mt. Gongga, eastern Tibetan Plateau, and deciphered the effects of vegetation and mountain cold condensation on their distributions with elevation. The metal concentrations in the soils were comparable to other mountains worldwide except the remarkably high concentrations of Cd. Trace metals with high enrichment in the soils were influenced from anthropogenic contributions. Spatially, the concentrations of Cu and Zn in the surface horizons decreased from 2000 to 3700 m a.s.l., and then increased with elevation, whereas other metals were notably enriched in the mid-elevation area (approximately 3000 m a.s.l.). After normalization for soil organic carbon, high concentrations of Cd, Pb, Sb and Zn were observed above the timberline. Our results indicated the importance of vegetation in trace metal accumulation in an alpine ecosystem and highlighted the mountain cold trapping effect on trace metal deposition sourced from long-range atmospheric transport. PMID:27052807
Kolker, A.; Wooden, J.L.; Persing, H.M.; Zielinski, R.A.
2000-01-01
The distribution of Cr and other trace metals of environmental interest in a range of widely used U.S. coals was investigated using the Stanford-USGS SHRIMP-RG ion microprobe . Using the oxygen ion source, concentrations of Cr (11 to 176 ppm), V (23 to 248 ppm), Mn (2 to 149 ppm), Ni (2 to 30 ppm), and 13 other elements were determined in illite/smectite, a group of clay minerals commonly present in coal. The results confirm previous indirect or semi-quantitative determinations indicating illite/smectite to be an important host of these metals. Calibration was achieved using doped aluminosilicate-glass synthetic standards and glasses prepared from USGS rock standards. Grains for analysis were identified optically, and confirmed by 1) precursory electron microprobe analysis and wavelength-dispersive compositional mapping, and 2) SHRIMP-RG major element data obtained concurrently with trace element results. Follow-up investigations will focus on the distribution of As and other elements that are more effectively ionized with the cesium primary beam currently being tested.
NASA Technical Reports Server (NTRS)
Fishman, Jack; Al-Saadi, Jassim A.; Neil, Doreen O.; Creilson, John K.; Severance, Kurt; Thomason, Larry W.; Edwards, David R.
2008-01-01
When the first observations of a tropospheric trace gas were obtained in the 1980s, carbon monoxide enhancements from tropical biomass burning dominated the observed features. In 2005, an active remote-sensing system to provide detailed information on the vertical distribution of aerosols and clouds was launched, and again, one of the most imposing features observed was the presence of emissions from tropical biomass burning. This paper presents a brief overview of space-borne observations of the distribution of trace gases and aerosols and how tropical biomass burning, primarily in the Southern Hemisphere, has provided an initially surprising picture of the distribution of these species and how they have evolved from prevailing transport patterns in that hemisphere. We also show how interpretation of these observations has improved significantly as a result of the improved capability of trajectory modeling in recent years and how information from this capability has provided additional insight into previous measurements form satellites. Key words: pollution; biomass burning; aerosols; tropical trace gas emissions; Southern Hemisphere; carbon monoxide.
Cui, Xing; Okayasu, Ryuichi
2008-12-01
The arsenic accumulation, distribution and influences on metallothionein-1 (MT-1) expression and other trace elements in various organs were examined in rats orally exposed to sodium arsenate (iAs(V)). Rats received a dose of 0, 1, 10 and 100ppm of iAs(V) in drinking water daily for 4- and 16-weeks. Arsenic seems to be distributed in all of the tissues, and was accumulated relatively higher in the spleen, lung and kidney compared to the liver, and much lower in skin and cerebrum. High dose of iAs(V)-exposure significantly increased the concentration of copper in the kidney, but did not influence other trace elements such as zinc and manganese in the liver. The mRNA expression of MT-1 was dose-dependently increased by iAs(V)-exposure in the liver whereas it was decreased in the kidney. These data indicate that arsenic is widely distributed and significantly accumulated in various organs and influences on other trace elements, and also modulates MT-1 expression in the liver and kidney.
Lee, Howard; Chapiro, Julius; Schernthaner, Rüdiger; Duran, Rafael; Wang, Zhijun; Gorodetski, Boris; Geschwind, Jean-François; Lin, MingDe
2015-04-01
The objective of this study was to demonstrate that an intra-arterial liver therapy clinical research database system is a more workflow efficient and robust tool for clinical research than a spreadsheet storage system. The database system could be used to generate clinical research study populations easily with custom search and retrieval criteria. A questionnaire was designed and distributed to 21 board-certified radiologists to assess current data storage problems and clinician reception to a database management system. Based on the questionnaire findings, a customized database and user interface system were created to perform automatic calculations of clinical scores including staging systems such as the Child-Pugh and Barcelona Clinic Liver Cancer, and facilitates data input and output. Questionnaire participants were favorable to a database system. The interface retrieved study-relevant data accurately and effectively. The database effectively produced easy-to-read study-specific patient populations with custom-defined inclusion/exclusion criteria. The database management system is workflow efficient and robust in retrieving, storing, and analyzing data. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Multi-modality molecular imaging: pre-clinical laboratory configuration
NASA Astrophysics Data System (ADS)
Wu, Yanjun; Wellen, Jeremy W.; Sarkar, Susanta K.
2006-02-01
In recent years, the prevalence of in vivo molecular imaging applications has rapidly increased. Here we report on the construction of a multi-modality imaging facility in a pharmaceutical setting that is expected to further advance existing capabilities for in vivo imaging of drug distribution and the interaction with their target. The imaging instrumentation in our facility includes a microPET scanner, a four wavelength time-domain optical imaging scanner, a 9.4T/30cm MRI scanner and a SPECT/X-ray CT scanner. An electronics shop and a computer room dedicated to image analysis are additional features of the facility. The layout of the facility was designed with a central animal preparation room surrounded by separate laboratory rooms for each of the major imaging modalities to accommodate the work-flow of simultaneous in vivo imaging experiments. This report will focus on the design of and anticipated applications for our microPET and optical imaging laboratory spaces. Additionally, we will discuss efforts to maximize the daily throughput of animal scans through development of efficient experimental work-flows and the use of multiple animals in a single scanning session.
Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris
2011-10-20
Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.
NASA Astrophysics Data System (ADS)
Kim, S.; Guenther, A. B.; Seco, R.; Gu, D.; Jeong, D.; Sanchez, D.; Brune, W. H.; Blake, D. R.; Armin, W.; Ahn, J. Y.; Lee, Y.; Kim, D.; Shin, H.; Jung, J.; Kim, D. S.; Lee, M.; Lee, G.
2017-12-01
During the KORUS-AQ field campaign in 2016, various platforms were utilized to characterize emission, chemical transformation, and removal of trace gases and fine particles. One may consider that the Seoul Metropolitan Area, where was the main study area, is a relatively small metropolitan in physical size wise but it is an extremely dense metropolitan area with various anthropogenic and natural emission sources. Therefore, the comprehensive understanding of various emission sources and complicated photochemistry within the boundary layer of the megacity should be preceded to precisely evaluate the impacts of megacity to global air quality and climate. In this context, we will present a detailed analysis of trace gas distributions over the Seoul Metropolitan Area. The focus will be a dataset collected at the Taehwa Research Forest, a downwind forest for fresh and aged pollution plumes. The trace gas reactivity also known as OH reactivity will be presented by comparing with a city center research site-the Olympic Park supersite. The DC-8 aircraft dataset will be presented to examine the evolution of anthropogenic pollution and the amplification of photochemistry from biogenic volatile organic compound emissions. Eventually, we expect that the three dimensional analysis of the distributions of atmospheric reactivity will provide an important snapshot on a complex nature of trace gas distribution in the Megacity planetary boundary layer.
Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias
2011-03-21
Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users “bundles of features†consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.
Managing and Communicating Operational Workflow
Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.
2016-01-01
Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407
Feliubadaló, Lídia; Lopez-Doriga, Adriana; Castellsagué, Ester; del Valle, Jesús; Menéndez, Mireia; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Gómez, Carolina; Campos, Olga; Pineda, Marta; González, Sara; Moreno, Victor; Brunet, Joan; Blanco, Ignacio; Serra, Eduard; Capellá, Gabriel; Lázaro, Conxi
2013-01-01
Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches. PMID:23249957