Sample records for high throughput workflow

  1. High-Throughput Industrial Coatings Research at The Dow Chemical Company.

    PubMed

    Kuo, Tzu-Chi; Malvadkar, Niranjan A; Drumright, Ray; Cesaretti, Richard; Bishop, Matthew T

    2016-09-12

    At The Dow Chemical Company, high-throughput research is an active area for developing new industrial coatings products. Using the principles of automation (i.e., using robotic instruments), parallel processing (i.e., prepare, process, and evaluate samples in parallel), and miniaturization (i.e., reduce sample size), high-throughput tools for synthesizing, formulating, and applying coating compositions have been developed at Dow. In addition, high-throughput workflows for measuring various coating properties, such as cure speed, hardness development, scratch resistance, impact toughness, resin compatibility, pot-life, surface defects, among others have also been developed in-house. These workflows correlate well with the traditional coatings tests, but they do not necessarily mimic those tests. The use of such high-throughput workflows in combination with smart experimental designs allows accelerated discovery and commercialization.

  2. Multivariate Analysis of High Through-Put Adhesively Bonded Single Lap Joints: Experimental and Workflow Protocols

    DTIC Science & Technology

    2016-06-01

    unlimited. v List of Tables Table 1 Single-lap-joint experimental parameters ..............................................7 Table 2 Survey ...Joints: Experimental and Workflow Protocols by Robert E Jensen, Daniel C DeSchepper, and David P Flanagan Approved for...TR-7696 ● JUNE 2016 US Army Research Laboratory Multivariate Analysis of High Through-Put Adhesively Bonded Single Lap Joints: Experimental

  3. High throughput workflow for coacervate formation and characterization in shampoo systems.

    PubMed

    Kalantar, T H; Tucker, C J; Zalusky, A S; Boomgaard, T A; Wilson, B E; Ladika, M; Jordan, S L; Li, W K; Zhang, X; Goh, C G

    2007-01-01

    Cationic cellulosic polymers find wide utility as benefit agents in shampoo. Deposition of these polymers onto hair has been shown to mend split-ends, improve appearance and wet combing, as well as provide controlled delivery of insoluble actives. The deposition is thought to be enhanced by the formation of a polymer/surfactant complex that phase-separates from the bulk solution upon dilution. A standard characterization method has been developed to characterize the coacervate formation upon dilution, but the test is time and material prohibitive. We have developed a semi-automated high throughput workflow to characterize the coacervate-forming behavior of different shampoo formulations. A procedure that allows testing of real use shampoo dilutions without first formulating a complete shampoo was identified. This procedure was adapted to a Tecan liquid handler by optimizing the parameters for liquid dispensing as well as for mixing. The high throughput workflow enabled preparation and testing of hundreds of formulations with different types and levels of cationic cellulosic polymers and surfactants, and for each formulation a haze diagram was constructed. Optimal formulations and their dilutions that give substantial coacervate formation (determined by haze measurements) were identified. Results from this high throughput workflow were shown to reproduce standard haze and bench-top turbidity measurements, and this workflow has the advantages of using less material and allowing more variables to be tested with significant time savings.

  4. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    PubMed Central

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  5. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.

    PubMed

    Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.

  6. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid

    PubMed Central

    Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617

  7. ToxCast Workflow: High-throughput screening assay data processing, analysis and management (SOT)

    EPA Science Inventory

    US EPA’s ToxCast program is generating data in high-throughput screening (HTS) and high-content screening (HCS) assays for thousands of environmental chemicals, for use in developing predictive toxicity models. Currently the ToxCast screening program includes over 1800 unique c...

  8. Combinatorial materials research applied to the development of new surface coatings VII: An automated system for adhesion testing

    NASA Astrophysics Data System (ADS)

    Chisholm, Bret J.; Webster, Dean C.; Bennett, James C.; Berry, Missy; Christianson, David; Kim, Jongsoo; Mayo, Bret; Gubbins, Nathan

    2007-07-01

    An automated, high-throughput adhesion workflow that enables pseudobarnacle adhesion and coating/substrate adhesion to be measured on coating patches arranged in an array format on 4×8in.2 panels was developed. The adhesion workflow consists of the following process steps: (1) application of an adhesive to the coating array; (2) insertion of panels into a clamping device; (3) insertion of aluminum studs into the clamping device and onto coating surfaces, aligned with the adhesive; (4) curing of the adhesive; and (5) automated removal of the aluminum studs. Validation experiments comparing data generated using the automated, high-throughput workflow to data obtained using conventional, manual methods showed that the automated system allows for accurate ranking of relative coating adhesion performance.

  9. Integrating prediction, provenance, and optimization into high energy workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schram, M.; Bansal, V.; Friese, R. D.

    We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.

  10. Automated Purification of Recombinant Proteins: Combining High-throughput with High Yield

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Chiann Tso; Moore, Priscilla A.; Auberry, Deanna L.

    2006-05-01

    Protein crystallography, mapping protein interactions and other approaches of current functional genomics require not only purifying large numbers of proteins but also obtaining sufficient yield and homogeneity for downstream high-throughput applications. There is a need for the development of robust automated high-throughput protein expression and purification processes to meet these requirements. We developed and compared two alternative workflows for automated purification of recombinant proteins based on expression of bacterial genes in Escherichia coli: First - a filtration separation protocol based on expression of 800 ml E. coli cultures followed by filtration purification using Ni2+-NTATM Agarose (Qiagen). Second - a smallermore » scale magnetic separation method based on expression in 25 ml cultures of E.coli followed by 96-well purification on MagneHisTM Ni2+ Agarose (Promega). Both workflows provided comparable average yields of proteins about 8 ug of purified protein per unit of OD at 600 nm of bacterial culture. We discuss advantages and limitations of the automated workflows that can provide proteins more than 90 % pure in the range of 100 ug – 45 mg per purification run as well as strategies for optimization of these protocols.« less

  11. 20180311 - Differential Gene Expression and Concentration-Response Modeling Workflow for High-Throughput Transcriptomic (HTTr) Data: Results From MCF7 Cells (SOT)

    EPA Science Inventory

    Increasing efficiency and declining cost of generating whole transcriptome profiles has made high-throughput transcriptomics a practical option for chemical bioactivity screening. The resulting data output provides information on the expression of thousands of genes and is amenab...

  12. Differential Gene Expression and Concentration-Response Modeling Workflow for High-Throughput Transcriptomic (HTTr) Data: Results From MCF7 Cells

    EPA Science Inventory

    Increasing efficiency and declining cost of generating whole transcriptome profiles has made high-throughput transcriptomics a practical option for chemical bioactivity screening. The resulting data output provides information on the expression of thousands of genes and is amenab...

  13. A high throughput geocomputing system for remote sensing quantitative retrieval and a case study

    NASA Astrophysics Data System (ADS)

    Xue, Yong; Chen, Ziqiang; Xu, Hui; Ai, Jianwen; Jiang, Shuzheng; Li, Yingjie; Wang, Ying; Guang, Jie; Mei, Linlu; Jiao, Xijuan; He, Xingwei; Hou, Tingting

    2011-12-01

    The quality and accuracy of remote sensing instruments have been improved significantly, however, rapid processing of large-scale remote sensing data becomes the bottleneck for remote sensing quantitative retrieval applications. The remote sensing quantitative retrieval is a data-intensive computation application, which is one of the research issues of high throughput computation. The remote sensing quantitative retrieval Grid workflow is a high-level core component of remote sensing Grid, which is used to support the modeling, reconstruction and implementation of large-scale complex applications of remote sensing science. In this paper, we intend to study middleware components of the remote sensing Grid - the dynamic Grid workflow based on the remote sensing quantitative retrieval application on Grid platform. We designed a novel architecture for the remote sensing Grid workflow. According to this architecture, we constructed the Remote Sensing Information Service Grid Node (RSSN) with Condor. We developed a graphic user interface (GUI) tools to compose remote sensing processing Grid workflows, and took the aerosol optical depth (AOD) retrieval as an example. The case study showed that significant improvement in the system performance could be achieved with this implementation. The results also give a perspective on the potential of applying Grid workflow practices to remote sensing quantitative retrieval problems using commodity class PCs.

  14. A workflow to investigate exposure and pharmacokinetic influences on high-throughput in vitro chemical screening based on adverse outcome pathways, OpenTox USA 2015 Poster

    EPA Science Inventory

    Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption,...

  15. Hydrogen storage materials discovery via high throughput ball milling and gas sorption.

    PubMed

    Li, Bin; Kaye, Steven S; Riley, Conor; Greenberg, Doron; Galang, Daniel; Bailey, Mark S

    2012-06-11

    The lack of a high capacity hydrogen storage material is a major barrier to the implementation of the hydrogen economy. To accelerate discovery of such materials, we have developed a high-throughput workflow for screening of hydrogen storage materials in which candidate materials are synthesized and characterized via highly parallel ball mills and volumetric gas sorption instruments, respectively. The workflow was used to identify mixed imides with significantly enhanced absorption rates relative to Li2Mg(NH)2. The most promising material, 2LiNH2:MgH2 + 5 atom % LiBH4 + 0.5 atom % La, exhibits the best balance of absorption rate, capacity, and cycle-life, absorbing >4 wt % H2 in 1 h at 120 °C after 11 absorption-desorption cycles.

  16. ToxCast Data Generation: Chemical Workflow

    EPA Pesticide Factsheets

    This page describes the process EPA follows to select chemicals, procure chemicals, register chemicals, conduct a quality review of the chemicals, and prepare the chemicals for high-throughput screening.

  17. Development of a High-Throughput Ion-Exchange Resin Characterization Workflow.

    PubMed

    Liu, Chun; Dermody, Daniel; Harris, Keith; Boomgaard, Thomas; Sweeney, Jeff; Gisch, Daryl; Goltz, Bob

    2017-06-12

    A novel high-throughout (HTR) ion-exchange (IEX) resin workflow has been developed for characterizing ion exchange equilibrium of commercial and experimental IEX resins against a range of different applications where water environment differs from site to site. Because of its much higher throughput, design of experiment (DOE) methodology can be easily applied for studying the effects of multiple factors on resin performance. Two case studies will be presented to illustrate the efficacy of the combined HTR workflow and DOE method. In case study one, a series of anion exchange resins have been screened for selective removal of NO 3 - and NO 2 - in water environments consisting of multiple other anions, varied pH, and ionic strength. The response surface model (RSM) is developed to statistically correlate the resin performance with the water composition and predict the best resin candidate. In case study two, the same HTR workflow and DOE method have been applied for screening different cation exchange resins in terms of the selective removal of Mg 2+ , Ca 2+ , and Ba 2+ from high total dissolved salt (TDS) water. A master DOE model including all of the cation exchange resins is created to predict divalent cation removal by different IEX resins under specific conditions, from which the best resin candidates can be identified. The successful adoption of HTR workflow and DOE method for studying the ion exchange of IEX resins can significantly reduce the resources and time to address industry and application needs.

  18. Machine learning in computational biology to accelerate high-throughput protein expression.

    PubMed

    Sastry, Anand; Monk, Jonathan; Tegel, Hanna; Uhlen, Mathias; Palsson, Bernhard O; Rockberg, Johan; Brunk, Elizabeth

    2017-08-15

    The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. ebrunk@ucsd.edu or johanr@biotech.kth.se. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Essential attributes identified in the design of a Laboratory Information Management System for a high throughput siRNA screening laboratory.

    PubMed

    Grandjean, Geoffrey; Graham, Ryan; Bartholomeusz, Geoffrey

    2011-11-01

    In recent years high throughput screening operations have become a critical application in functional and translational research. Although a seemingly unmanageable amount of data is generated by these high-throughput, large-scale techniques, through careful planning, an effective Laboratory Information Management System (LIMS) can be developed and implemented in order to streamline all phases of a workflow. Just as important as data mining and analysis procedures at the end of complex processes is the tracking of individual steps of applications that generate such data. Ultimately, the use of a customized LIMS will enable users to extract meaningful results from large datasets while trusting the robustness of their assays. To illustrate the design of a custom LIMS, this practical example is provided to highlight the important aspects of the design of a LIMS to effectively modulate all aspects of an siRNA screening service. This system incorporates inventory management, control of workflow, data handling and interaction with investigators, statisticians and administrators. All these modules are regulated in a synchronous manner within the LIMS. © 2011 Bentham Science Publishers

  20. Searching for microbial protein over-expression in a complex matrix using automated high throughput MS-based proteomics tools.

    PubMed

    Akeroyd, Michiel; Olsthoorn, Maurien; Gerritsma, Jort; Gutker-Vermaas, Diana; Ekkelkamp, Laurens; van Rij, Tjeerd; Klaassen, Paul; Plugge, Wim; Smit, Ed; Strupat, Kerstin; Wenzel, Thibaut; van Tilborg, Marcel; van der Hoeven, Rob

    2013-03-10

    In the discovery of new enzymes genomic and cDNA expression libraries containing thousands of differential clones are generated to obtain biodiversity. These libraries need to be screened for the activity of interest. Removing so-called empty and redundant clones significantly reduces the size of these expression libraries and therefore speeds up new enzyme discovery. Here, we present a sensitive, generic workflow for high throughput screening of successful microbial protein over-expression in microtiter plates containing a complex matrix based on mass spectrometry techniques. MALDI-LTQ-Orbitrap screening followed by principal component analysis and peptide mass fingerprinting was developed to obtain a throughput of ∼12,000 samples per week. Alternatively, a UHPLC-MS(2) approach including MS(2) protein identification was developed for microorganisms with a complex protein secretome with a throughput of ∼2000 samples per week. TCA-induced protein precipitation enhanced by addition of bovine serum albumin is used for protein purification prior to MS detection. We show that this generic workflow can effectively reduce large expression libraries from fungi and bacteria to their minimal size by detection of successful protein over-expression using MS. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. High precision quantification of human plasma proteins using the automated SISCAPA Immuno-MS workflow.

    PubMed

    Razavi, Morteza; Leigh Anderson, N; Pope, Matthew E; Yip, Richard; Pearson, Terry W

    2016-09-25

    Efficient robotic workflows for trypsin digestion of human plasma and subsequent antibody-mediated peptide enrichment (the SISCAPA method) were developed with the goal of improving assay precision and throughput for multiplexed protein biomarker quantification. First, an 'addition only' tryptic digestion protocol was simplified from classical methods, eliminating the need for sample cleanup, while improving reproducibility, scalability and cost. Second, methods were developed to allow multiplexed enrichment and quantification of peptide surrogates of protein biomarkers representing a very broad range of concentrations and widely different molecular masses in human plasma. The total workflow coefficients of variation (including the 3 sequential steps of digestion, SISCAPA peptide enrichment and mass spectrometric analysis) for 5 proteotypic peptides measured in 6 replicates of each of 6 different samples repeated over 6 days averaged 3.4% within-run and 4.3% across all runs. An experiment to identify sources of variation in the workflow demonstrated that MRM measurement and tryptic digestion steps each had average CVs of ∼2.7%. Because of the high purity of the peptide analytes enriched by antibody capture, the liquid chromatography step is minimized and in some cases eliminated altogether, enabling throughput levels consistent with requirements of large biomarker and clinical studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. A plug-and-play pathway refactoring workflow for natural product research in Escherichia coli and Saccharomyces cerevisiae.

    PubMed

    Ren, Hengqian; Hu, Pingfan; Zhao, Huimin

    2017-08-01

    Pathway refactoring serves as an invaluable synthetic biology tool for natural product discovery, characterization, and engineering. However, the complicated and laborious molecular biology techniques largely hinder its application in natural product research, especially in a high-throughput manner. Here we report a plug-and-play pathway refactoring workflow for high-throughput, flexible pathway construction, and expression in both Escherichia coli and Saccharomyces cerevisiae. Biosynthetic genes were firstly cloned into pre-assembled helper plasmids with promoters and terminators, resulting in a series of expression cassettes. These expression cassettes were further assembled using Golden Gate reaction to generate fully refactored pathways. The inclusion of spacer plasmids in this system would not only increase the flexibility for refactoring pathways with different number of genes, but also facilitate gene deletion and replacement. As proof of concept, a total of 96 pathways for combinatorial carotenoid biosynthesis were built successfully. This workflow should be generally applicable to different classes of natural products produced by various organisms. Biotechnol. Bioeng. 2017;114: 1847-1854. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  3. A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

    PubMed

    Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

    2016-01-01

    The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.

  4. Purdue ionomics information management system. An integrated functional genomics platform.

    PubMed

    Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E

    2007-02-01

    The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.

  5. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    NASA Astrophysics Data System (ADS)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  6. Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2014-12-01

    The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.

  7. Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA

    PubMed Central

    Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.

    2007-01-01

    The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337

  8. Developing science gateways for drug discovery in a grid environment.

    PubMed

    Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra

    2016-01-01

    Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.

  9. Toward Streamlined Identification of Dioxin-like Compounds in Environmental Samples through Integration of Suspension Bioassay.

    PubMed

    Xiao, Hongxia; Brinkmann, Markus; Thalmann, Beat; Schiwy, Andreas; Große Brinkhaus, Sigrid; Achten, Christine; Eichbaum, Kathrin; Gembé, Carolin; Seiler, Thomas-Benjamin; Hollert, Henner

    2017-03-21

    Effect-directed analysis (EDA) is a powerful strategy to identify biologically active compounds in environmental samples. However, in current EDA studies, fractionation and handling procedures are laborious, consist of multiple evaporation steps, and thus bear the risk of contamination and decreased recoveries of the target compounds. The low resulting throughput has been one of the major bottlenecks of EDA. Here, we propose a high-throughput EDA (HT-EDA) work-flow combining reversed phase high-performance liquid chromatography fractionation of samples into 96-well microplates, followed by toxicity assessment in the micro-EROD bioassay with the wild-type rat hepatoma H4IIE cells, and chemical analysis of bioactive fractions. The approach was evaluated using single substances, binary mixtures, and extracts of sediment samples collected at the Three Gorges Reservoir, Yangtze River, China, as well as the rivers Rhine and Elbe, Germany. Selected bioactive fractions were analyzed by highly sensitive gas chromatography-atmospheric pressure laser ionization-time-of-flight-mass spectrometry. In addition, we optimized the work-flow by seeding previously adapted suspension-cultured H4IIE cells directly into the microplate used for fractionation, which makes any transfers of fractionated samples unnecessary. The proposed HT-EDA work-flow simplifies the procedure for wider application in ecotoxicology and environmental routine programs.

  10. Parallel Workflow for High-Throughput (>1,000 Samples/Day) Quantitative Analysis of Human Insulin-Like Growth Factor 1 Using Mass Spectrometric Immunoassay

    PubMed Central

    Oran, Paul E.; Trenchevska, Olgica; Nedelkov, Dobrin; Borges, Chad R.; Schaab, Matthew R.; Rehder, Douglas S.; Jarvis, Jason W.; Sherma, Nisha D.; Shen, Luhui; Krastins, Bryan; Lopez, Mary F.; Schwenke, Dawn C.; Reaven, Peter D.; Nelson, Randall W.

    2014-01-01

    Insulin-like growth factor 1 (IGF1) is an important biomarker for the management of growth hormone disorders. Recently there has been rising interest in deploying mass spectrometric (MS) methods of detection for measuring IGF1. However, widespread clinical adoption of any MS-based IGF1 assay will require increased throughput and speed to justify the costs of analyses, and robust industrial platforms that are reproducible across laboratories. Presented here is an MS-based quantitative IGF1 assay with performance rating of >1,000 samples/day, and a capability of quantifying IGF1 point mutations and posttranslational modifications. The throughput of the IGF1 mass spectrometric immunoassay (MSIA) benefited from a simplified sample preparation step, IGF1 immunocapture in a tip format, and high-throughput MALDI-TOF MS analysis. The Limit of Detection and Limit of Quantification of the resulting assay were 1.5 μg/L and 5 μg/L, respectively, with intra- and inter-assay precision CVs of less than 10%, and good linearity and recovery characteristics. The IGF1 MSIA was benchmarked against commercially available IGF1 ELISA via Bland-Altman method comparison test, resulting in a slight positive bias of 16%. The IGF1 MSIA was employed in an optimized parallel workflow utilizing two pipetting robots and MALDI-TOF-MS instruments synced into one-hour phases of sample preparation, extraction and MSIA pipette tip elution, MS data collection, and data processing. Using this workflow, high-throughput IGF1 quantification of 1,054 human samples was achieved in approximately 9 hours. This rate of assaying is a significant improvement over existing MS-based IGF1 assays, and is on par with that of the enzyme-based immunoassays. Furthermore, a mutation was detected in ∼1% of the samples (SNP: rs17884626, creating an A→T substitution at position 67 of the IGF1), demonstrating the capability of IGF1 MSIA to detect point mutations and posttranslational modifications. PMID:24664114

  11. Solar fuels photoanode materials discovery by integrating high-throughput theory and experiment

    DOE PAGES

    Yan, Qimin; Yu, Jie; Suram, Santosh K.; ...

    2017-03-06

    The limited number of known low-band-gap photoelectrocatalytic materials poses a significant challenge for the generation of chemical fuels from sunlight. Here, using high-throughput ab initio theory with experiments in an integrated workflow, we find eight ternary vanadate oxide photoanodes in the target band-gap range (1.2-2.8 eV). Detailed analysis of these vanadate compounds reveals the key role of VO 4 structural motifs and electronic band-edge character in efficient photoanodes, initiating a genome for such materials and paving the way for a broadly applicable high-throughput-discovery and materials-by-design feedback loop. Considerably expanding the number of known photoelectrocatalysts for water oxidation, our study establishesmore » ternary metal vanadates as a prolific class of photoanodematerials for generation of chemical fuels from sunlight and demonstrates our high-throughput theory-experiment pipeline as a prolific approach to materials discovery.« less

  12. Solar fuels photoanode materials discovery by integrating high-throughput theory and experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Qimin; Yu, Jie; Suram, Santosh K.

    The limited number of known low-band-gap photoelectrocatalytic materials poses a significant challenge for the generation of chemical fuels from sunlight. Here, using high-throughput ab initio theory with experiments in an integrated workflow, we find eight ternary vanadate oxide photoanodes in the target band-gap range (1.2-2.8 eV). Detailed analysis of these vanadate compounds reveals the key role of VO 4 structural motifs and electronic band-edge character in efficient photoanodes, initiating a genome for such materials and paving the way for a broadly applicable high-throughput-discovery and materials-by-design feedback loop. Considerably expanding the number of known photoelectrocatalysts for water oxidation, our study establishesmore » ternary metal vanadates as a prolific class of photoanodematerials for generation of chemical fuels from sunlight and demonstrates our high-throughput theory-experiment pipeline as a prolific approach to materials discovery.« less

  13. Recent developments in software tools for high-throughput in vitro ADME support with high-resolution MS.

    PubMed

    Paiva, Anthony; Shou, Wilson Z

    2016-08-01

    The last several years have seen the rapid adoption of the high-resolution MS (HRMS) for bioanalytical support of high throughput in vitro ADME profiling. Many capable software tools have been developed and refined to process quantitative HRMS bioanalysis data for ADME samples with excellent performance. Additionally, new software applications specifically designed for quan/qual soft spot identification workflows using HRMS have greatly enhanced the quality and efficiency of the structure elucidation process for high throughput metabolite ID in early in vitro ADME profiling. Finally, novel approaches in data acquisition and compression, as well as tools for transferring, archiving and retrieving HRMS data, are being continuously refined to tackle the issue of large data file size typical for HRMS analyses.

  14. Validation of high-throughput single cell analysis methodology.

    PubMed

    Devonshire, Alison S; Baradez, Marc-Olivier; Morley, Gary; Marshall, Damian; Foy, Carole A

    2014-05-01

    High-throughput quantitative polymerase chain reaction (qPCR) approaches enable profiling of multiple genes in single cells, bringing new insights to complex biological processes and offering opportunities for single cell-based monitoring of cancer cells and stem cell-based therapies. However, workflows with well-defined sources of variation are required for clinical diagnostics and testing of tissue-engineered products. In a study of neural stem cell lines, we investigated the performance of lysis, reverse transcription (RT), preamplification (PA), and nanofluidic qPCR steps at the single cell level in terms of efficiency, precision, and limit of detection. We compared protocols using a separate lysis buffer with cell capture directly in RT-PA reagent. The two methods were found to have similar lysis efficiencies, whereas the direct RT-PA approach showed improved precision. Digital PCR was used to relate preamplified template copy numbers to Cq values and reveal where low-quality signals may affect the analysis. We investigated the impact of calibration and data normalization strategies as a means of minimizing the impact of inter-experimental variation on gene expression values and found that both approaches can improve data comparability. This study provides validation and guidance for the application of high-throughput qPCR workflows for gene expression profiling of single cells. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification.

    PubMed

    Beirnaert, Charlie; Meysman, Pieter; Vu, Trung Nghia; Hermans, Nina; Apers, Sandra; Pieters, Luc; Covaci, Adrian; Laukens, Kris

    2018-03-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).

  16. speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification

    PubMed Central

    Pieters, Luc; Covaci, Adrian

    2018-01-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq). PMID:29494588

  17. Nanosurveyor: a framework for real-time data processing

    DOE PAGES

    Daurer, Benedikt J.; Krishnan, Hari; Perciano, Talita; ...

    2017-01-31

    Background: The ever improving brightness of accelerator based sources is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionality. Results: Here we present an integrated software/algorithmic framework designed to capitalize on high-throughput experiments through efficient kernels, load-balanced workflows, which are scalable in design. We describe the streamlined processing pipeline of ptychography data analysis. Conclusions: The pipeline provides throughput, compression, and resolution as well as rapid feedback to the microscope operators.

  18. Microreactor Cells for High-Throughput X-ray Absorption Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beesley, Angela; Tsapatsaris, Nikolaos; Weiher, Norbert

    2007-01-19

    High-throughput experimentation has been applied to X-ray Absorption spectroscopy as a novel route for increasing research productivity in the catalysis community. Suitable instrumentation has been developed for the rapid determination of the local structure in the metal component of precursors for supported catalysts. An automated analytical workflow was implemented that is much faster than traditional individual spectrum analysis. It allows the generation of structural data in quasi-real time. We describe initial results obtained from the automated high throughput (HT) data reduction and analysis of a sample library implemented through the 96 well-plate industrial standard. The results show that a fullymore » automated HT-XAS technology based on existing industry standards is feasible and useful for the rapid elucidation of geometric and electronic structure of materials.« less

  19. Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.

    PubMed

    Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell

    2011-07-26

    Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.

  20. How Can We Better Detect Unauthorized GMOs in Food and Feed Chains?

    PubMed

    Fraiture, Marie-Alice; Herman, Philippe; De Loose, Marc; Debode, Frédéric; Roosens, Nancy H

    2017-06-01

    Current GMO detection systems have limited abilities to detect unauthorized genetically modified organisms (GMOs). Here, we propose a new workflow, based on next-generation sequencing (NGS) technology, to overcome this problem. In providing information about DNA sequences, this high-throughput workflow can distinguish authorized and unauthorized GMOs by strengthening the tools commonly used by enforcement laboratories with the help of NGS technology. In addition, thanks to its massive sequencing capacity, this workflow could be used to monitor GMOs present in the food and feed chain. In view of its potential implementation by enforcement laboratories, we discuss this innovative approach, its current limitations, and its sustainability of use over time. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. A Workflow to Investigate Exposure and Pharmacokinetic Influences on High-Throughput in Vitro Chemical Screening Based on Adverse Outcome Pathways

    EPA Science Inventory

    Background: Adverse outcome pathways (AOPs) link adverse effects in individuals or populations to a molecular initiating event (MIE) that can be quantified using in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires incorporation of knowled...

  2. High-throughput Titration of Luciferase-expressing Recombinant Viruses

    PubMed Central

    Garcia, Vanessa; Krishnan, Ramya; Davis, Colin; Batenchuk, Cory; Le Boeuf, Fabrice; Abdelbary, Hesham; Diallo, Jean-Simon

    2014-01-01

    Standard plaque assays to determine infectious viral titers can be time consuming, are not amenable to a high volume of samples, and cannot be done with viruses that do not form plaques. As an alternative to plaque assays, we have developed a high-throughput titration method that allows for the simultaneous titration of a high volume of samples in a single day. This approach involves infection of the samples with a Firefly luciferase tagged virus, transfer of the infected samples onto an appropriate permissive cell line, subsequent addition of luciferin, reading of plates in order to obtain luminescence readings, and finally the conversion from luminescence to viral titers. The assessment of cytotoxicity using a metabolic viability dye can be easily incorporated in the workflow in parallel and provide valuable information in the context of a drug screen. This technique provides a reliable, high-throughput method to determine viral titers as an alternative to a standard plaque assay. PMID:25285536

  3. [Weighted gene co-expression network analysis in biomedicine research].

    PubMed

    Liu, Wei; Li, Li; Ye, Hua; Tu, Wei

    2017-11-25

    High-throughput biological technologies are now widely applied in biology and medicine, allowing scientists to monitor thousands of parameters simultaneously in a specific sample. However, it is still an enormous challenge to mine useful information from high-throughput data. The emergence of network biology provides deeper insights into complex bio-system and reveals the modularity in tissue/cellular networks. Correlation networks are increasingly used in bioinformatics applications. Weighted gene co-expression network analysis (WGCNA) tool can detect clusters of highly correlated genes. Therefore, we systematically reviewed the application of WGCNA in the study of disease diagnosis, pathogenesis and other related fields. First, we introduced principle, workflow, advantages and disadvantages of WGCNA. Second, we presented the application of WGCNA in disease, physiology, drug, evolution and genome annotation. Then, we indicated the application of WGCNA in newly developed high-throughput methods. We hope this review will help to promote the application of WGCNA in biomedicine research.

  4. Scrambled eggs: A highly sensitive molecular diagnostic workflow for Fasciola species specific detection from faecal samples.

    PubMed

    Calvani, Nichola Eliza Davies; Windsor, Peter Andrew; Bush, Russell David; Šlapeta, Jan

    2017-09-01

    Fasciolosis, due to Fasciola hepatica and Fasciola gigantica, is a re-emerging zoonotic parasitic disease of worldwide importance. Human and animal infections are commonly diagnosed by the traditional sedimentation and faecal egg-counting technique. However, this technique is time-consuming and prone to sensitivity errors when a large number of samples must be processed or if the operator lacks sufficient experience. Additionally, diagnosis can only be made once the 12-week pre-patent period has passed. Recently, a commercially available coprological antigen ELISA has enabled detection of F. hepatica prior to the completion of the pre-patent period, providing earlier diagnosis and increased throughput, although species differentiation is not possible in areas of parasite sympatry. Real-time PCR offers the combined benefits of highly sensitive species differentiation for medium to large sample sizes. However, no molecular diagnostic workflow currently exists for the identification of Fasciola spp. in faecal samples. A new molecular diagnostic workflow for the highly-sensitive detection and quantification of Fasciola spp. in faecal samples was developed. The technique involves sedimenting and pelleting the samples prior to DNA isolation in order to concentrate the eggs, followed by disruption by bead-beating in a benchtop homogeniser to ensure access to DNA. Although both the new molecular workflow and the traditional sedimentation technique were sensitive and specific, the new molecular workflow enabled faster sample throughput in medium to large epidemiological studies, and provided the additional benefit of speciation. Further, good correlation (R2 = 0.74-0.76) was observed between the real-time PCR values and the faecal egg count (FEC) using the new molecular workflow for all herds and sampling periods. Finally, no effect of storage in 70% ethanol was detected on sedimentation and DNA isolation outcomes; enabling transport of samples from endemic to non-endemic countries without the requirement of a complete cold chain. The commercially-available ELISA displayed poorer sensitivity, even after adjustment of the positive threshold (65-88%), compared to the sensitivity (91-100%) of the new molecular diagnostic workflow. Species-specific assays for sensitive detection of Fasciola spp. enable ante-mortem diagnosis in both human and animal settings. This includes Southeast Asia where there are potentially many undocumented human cases and where post-mortem examination of production animals can be difficult. The new molecular workflow provides a sensitive and quantitative diagnostic approach for the rapid testing of medium to large sample sizes, potentially superseding the traditional sedimentation and FEC technique and enabling surveillance programs in locations where animal and human health funding is limited.

  5. A Novel Two-Step Hierarchial Quantitative Structure-Activity Relationship Modeling Workflow for Predicting Acute Toxicity of Chemicals in Rodents

    EPA Science Inventory

    Background: Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public–private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening. Methods and results: A database co...

  6. Arbovirus Detection in Insect Vectors by Rapid, High-Throughput Pyrosequencing

    DTIC Science & Technology

    2010-11-09

    large contigs that by BLAST had their best hit to a rRNA of various fungal origins (including the genera Penicillium and Aspergillus ) and in all five...bioinformatic workflows need to be streamlined for non-expert users. For this approach to ever become part of the public health arsenal, our calculations

  7. An open workflow to generate “MS Ready” structures and improve non-targeted mass spectrometry (ACS Fall 1 of 3)

    EPA Science Inventory

    High-throughput non-targeted analyses (NTA) rely on chemical reference databases for tentative identification of observed chemical features. Many of these databases and online resources incorporate chemical structure data not in a form that is readily observed by mass spectromet...

  8. Droplet microfluidic technology for single-cell high-throughput screening.

    PubMed

    Brouzes, Eric; Medkova, Martina; Savenelli, Neal; Marran, Dave; Twardowski, Mariusz; Hutchison, J Brian; Rothberg, Jonathan M; Link, Darren R; Perrimon, Norbert; Samuels, Michael L

    2009-08-25

    We present a droplet-based microfluidic technology that enables high-throughput screening of single mammalian cells. This integrated platform allows for the encapsulation of single cells and reagents in independent aqueous microdroplets (1 pL to 10 nL volumes) dispersed in an immiscible carrier oil and enables the digital manipulation of these reactors at a very high-throughput. Here, we validate a full droplet screening workflow by conducting a droplet-based cytotoxicity screen. To perform this screen, we first developed a droplet viability assay that permits the quantitative scoring of cell viability and growth within intact droplets. Next, we demonstrated the high viability of encapsulated human monocytic U937 cells over a period of 4 days. Finally, we developed an optically-coded droplet library enabling the identification of the droplets composition during the assay read-out. Using the integrated droplet technology, we screened a drug library for its cytotoxic effect against U937 cells. Taken together our droplet microfluidic platform is modular, robust, uses no moving parts, and has a wide range of potential applications including high-throughput single-cell analyses, combinatorial screening, and facilitating small sample analyses.

  9. MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.

    PubMed

    Meyer, Folker; Bagchi, Saurabh; Chaterji, Somali; Gerlach, Wolfgang; Grama, Ananth; Harrison, Travis; Paczian, Tobias; Trimble, William L; Wilke, Andreas

    2017-09-26

    As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks. Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.

  10. Automated high throughput microscale antibody purification workflows for accelerating antibody discovery

    PubMed Central

    Luan, Peng; Lee, Sophia; Paluch, Maciej; Kansopon, Joe; Viajar, Sharon; Begum, Zahira; Chiang, Nancy; Nakamura, Gerald; Hass, Philip E.; Wong, Athena W.; Lazar, Greg A.

    2018-01-01

    ABSTRACT To rapidly find “best-in-class” antibody therapeutics, it has become essential to develop high throughput (HTP) processes that allow rapid assessment of antibodies for functional and molecular properties. Consequently, it is critical to have access to sufficient amounts of high quality antibody, to carry out accurate and quantitative characterization. We have developed automated workflows using liquid handling systems to conduct affinity-based purification either in batch or tip column mode. Here, we demonstrate the capability to purify >2000 antibodies per day from microscale (1 mL) cultures. Our optimized, automated process for human IgG1 purification using MabSelect SuRe resin achieves ∼70% recovery over a wide range of antibody loads, up to 500 µg. This HTP process works well for hybridoma-derived antibodies that can be purified by MabSelect SuRe resin. For rat IgG2a, which is often encountered in hybridoma cultures and is challenging to purify via an HTP process, we established automated purification with GammaBind Plus resin. Using these HTP purification processes, we can efficiently recover sufficient amounts of antibodies from mammalian transient or hybridoma cultures with quality comparable to conventional column purification. PMID:29494273

  11. Image-based computational quantification and visualization of genetic alterations and tumour heterogeneity

    PubMed Central

    Zhong, Qing; Rüschoff, Jan H.; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J.; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J.; Rupp, Niels J.; Fankhauser, Christian; Buhmann, Joachim M.; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A.; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C.; Jochum, Wolfram; Wild, Peter J.

    2016-01-01

    Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility. PMID:27052161

  12. Image-based computational quantification and visualization of genetic alterations and tumour heterogeneity.

    PubMed

    Zhong, Qing; Rüschoff, Jan H; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J; Rupp, Niels J; Fankhauser, Christian; Buhmann, Joachim M; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C; Jochum, Wolfram; Wild, Peter J

    2016-04-07

    Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility.

  13. Chipster: user-friendly analysis software for microarray and other high-throughput data.

    PubMed

    Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I

    2011-10-14

    The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.

  14. Chipster: user-friendly analysis software for microarray and other high-throughput data

    PubMed Central

    2011-01-01

    Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641

  15. Pre-amplification in the context of high-throughput qPCR gene expression experiment.

    PubMed

    Korenková, Vlasta; Scott, Justin; Novosadová, Vendula; Jindřichová, Marie; Langerová, Lucie; Švec, David; Šídová, Monika; Sjöback, Robert

    2015-03-11

    With the introduction of the first high-throughput qPCR instrument on the market it became possible to perform thousands of reactions in a single run compared to the previous hundreds. In the high-throughput reaction, only limited volumes of highly concentrated cDNA or DNA samples can be added. This necessity can be solved by pre-amplification, which became a part of the high-throughput experimental workflow. Here, we focused our attention on the limits of the specific target pre-amplification reaction and propose the optimal, general setup for gene expression experiment using BioMark instrument (Fluidigm). For evaluating different pre-amplification factors following conditions were combined: four human blood samples from healthy donors and five transcripts having high to low expression levels; each cDNA sample was pre-amplified at four cycles (15, 18, 21, and 24) and five concentrations (equivalent to 0.078 ng, 0.32 ng, 1.25 ng, 5 ng, and 20 ng of total RNA). Factors identified as critical for a success of cDNA pre-amplification were cycle of pre-amplification, total RNA concentration, and type of gene. The selected pre-amplification reactions were further tested for optimal Cq distribution in a BioMark Array. The following concentrations combined with pre-amplification cycles were optimal for good quality samples: 20 ng of total RNA with 15 cycles of pre-amplification, 20x and 40x diluted; and 5 ng and 20 ng of total RNA with 18 cycles of pre-amplification, both 20x and 40x diluted. We set up upper limits for the bulk gene expression experiment using gene expression Dynamic Array and provided an easy-to-obtain tool for measuring of pre-amplification success. We also showed that variability of the pre-amplification, introduced into the experimental workflow of reverse transcription-qPCR, is lower than variability caused by the reverse transcription step.

  16. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  17. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  18. The Development of a High-Throughput/Combinatorial Workflow for the Study of Porous Polymer Networks

    DTIC Science & Technology

    2012-04-05

    poragen composition , poragen level, and cure temperature. A total of 216 unique compositions were prepared. Changes in opacity of the blends as they cured...allowed for the identification of compositional variables and process variables that enabled the production of porous networks. Keywords: high...in polymer network cross-link density,poragen composition , poragen level, and cure temperature. A total of 216 unique compositions were prepared

  19. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

    PubMed Central

    Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.

    2017-01-01

    High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787

  20. Small RNA Library Preparation Method for Next-Generation Sequencing Using Chemical Modifications to Prevent Adapter Dimer Formation.

    PubMed

    Shore, Sabrina; Henderson, Jordana M; Lebedev, Alexandre; Salcedo, Michelle P; Zon, Gerald; McCaffrey, Anton P; Paul, Natasha; Hogrefe, Richard I

    2016-01-01

    For most sample types, the automation of RNA and DNA sample preparation workflows enables high throughput next-generation sequencing (NGS) library preparation. Greater adoption of small RNA (sRNA) sequencing has been hindered by high sample input requirements and inherent ligation side products formed during library preparation. These side products, known as adapter dimer, are very similar in size to the tagged library. Most sRNA library preparation strategies thus employ a gel purification step to isolate tagged library from adapter dimer contaminants. At very low sample inputs, adapter dimer side products dominate the reaction and limit the sensitivity of this technique. Here we address the need for improved specificity of sRNA library preparation workflows with a novel library preparation approach that uses modified adapters to suppress adapter dimer formation. This workflow allows for lower sample inputs and elimination of the gel purification step, which in turn allows for an automatable sRNA library preparation protocol.

  1. Scrambled eggs: A highly sensitive molecular diagnostic workflow for Fasciola species specific detection from faecal samples

    PubMed Central

    Calvani, Nichola Eliza Davies; Windsor, Peter Andrew; Bush, Russell David

    2017-01-01

    Background Fasciolosis, due to Fasciola hepatica and Fasciola gigantica, is a re-emerging zoonotic parasitic disease of worldwide importance. Human and animal infections are commonly diagnosed by the traditional sedimentation and faecal egg-counting technique. However, this technique is time-consuming and prone to sensitivity errors when a large number of samples must be processed or if the operator lacks sufficient experience. Additionally, diagnosis can only be made once the 12-week pre-patent period has passed. Recently, a commercially available coprological antigen ELISA has enabled detection of F. hepatica prior to the completion of the pre-patent period, providing earlier diagnosis and increased throughput, although species differentiation is not possible in areas of parasite sympatry. Real-time PCR offers the combined benefits of highly sensitive species differentiation for medium to large sample sizes. However, no molecular diagnostic workflow currently exists for the identification of Fasciola spp. in faecal samples. Methodology/Principal findings A new molecular diagnostic workflow for the highly-sensitive detection and quantification of Fasciola spp. in faecal samples was developed. The technique involves sedimenting and pelleting the samples prior to DNA isolation in order to concentrate the eggs, followed by disruption by bead-beating in a benchtop homogeniser to ensure access to DNA. Although both the new molecular workflow and the traditional sedimentation technique were sensitive and specific, the new molecular workflow enabled faster sample throughput in medium to large epidemiological studies, and provided the additional benefit of speciation. Further, good correlation (R2 = 0.74–0.76) was observed between the real-time PCR values and the faecal egg count (FEC) using the new molecular workflow for all herds and sampling periods. Finally, no effect of storage in 70% ethanol was detected on sedimentation and DNA isolation outcomes; enabling transport of samples from endemic to non-endemic countries without the requirement of a complete cold chain. The commercially-available ELISA displayed poorer sensitivity, even after adjustment of the positive threshold (65–88%), compared to the sensitivity (91–100%) of the new molecular diagnostic workflow. Conclusions/Significance Species-specific assays for sensitive detection of Fasciola spp. enable ante-mortem diagnosis in both human and animal settings. This includes Southeast Asia where there are potentially many undocumented human cases and where post-mortem examination of production animals can be difficult. The new molecular workflow provides a sensitive and quantitative diagnostic approach for the rapid testing of medium to large sample sizes, potentially superseding the traditional sedimentation and FEC technique and enabling surveillance programs in locations where animal and human health funding is limited. PMID:28915255

  2. Droplet-based microfluidic analysis and screening of single plant cells.

    PubMed

    Yu, Ziyi; Boehm, Christian R; Hibberd, Julian M; Abell, Chris; Haseloff, Jim; Burgess, Steven J; Reyna-Llorens, Ivan

    2018-01-01

    Droplet-based microfluidics has been used to facilitate high-throughput analysis of individual prokaryote and mammalian cells. However, there is a scarcity of similar workflows applicable to rapid phenotyping of plant systems where phenotyping analyses typically are time-consuming and low-throughput. We report on-chip encapsulation and analysis of protoplasts isolated from the emergent plant model Marchantia polymorpha at processing rates of >100,000 cells per hour. We use our microfluidic system to quantify the stochastic properties of a heat-inducible promoter across a population of transgenic protoplasts to demonstrate its potential for assessing gene expression activity in response to environmental conditions. We further demonstrate on-chip sorting of droplets containing YFP-expressing protoplasts from wild type cells using dielectrophoresis force. This work opens the door to droplet-based microfluidic analysis of plant cells for applications ranging from high-throughput characterisation of DNA parts to single-cell genomics to selection of rare plant phenotypes.

  3. Strain Library Imaging Protocol for high-throughput, automated single-cell microscopy of large bacterial collections arrayed on multiwell plates.

    PubMed

    Shi, Handuo; Colavin, Alexandre; Lee, Timothy K; Huang, Kerwyn Casey

    2017-02-01

    Single-cell microscopy is a powerful tool for studying gene functions using strain libraries, but it suffers from throughput limitations. Here we describe the Strain Library Imaging Protocol (SLIP), which is a high-throughput, automated microscopy workflow for large strain collections that requires minimal user involvement. SLIP involves transferring arrayed bacterial cultures from multiwell plates onto large agar pads using inexpensive replicator pins and automatically imaging the resulting single cells. The acquired images are subsequently reviewed and analyzed by custom MATLAB scripts that segment single-cell contours and extract quantitative metrics. SLIP yields rich data sets on cell morphology and gene expression that illustrate the function of certain genes and the connections among strains in a library. For a library arrayed on 96-well plates, image acquisition can be completed within 4 min per plate.

  4. Extending the Derek-Meteor Workflow to Predict Chemical-Toxicity Space Impacted by Metabolism: Application to ToxCast and Tox21 Chemical Inventories

    EPA Science Inventory

    A central aim of EPA’s ToxCast project is to use in vitro high-throughput screening (HTS) profiles to build predictive models of in vivo toxicity. Where assays lack metabolic capability, such efforts may need to anticipate the role of metabolic activation (or deactivation). A wo...

  5. High-throughput bioconjugation for enhanced 193 nm photodissociation via droplet phase initiated ion/ion chemistry using a front-end dual spray reactor.

    PubMed

    Cotham, Victoria C; Shaw, Jared B; Brodbelt, Jennifer S

    2015-09-15

    Fast online chemical derivatization of peptides with an aromatic label for enhanced 193 nm ultraviolet photodissociation (UVPD) is demonstrated using a dual electrospray reactor implemented on the front-end of a linear ion trap (LIT) mass spectrometer. The reactor facilitates the intersection of protonated peptides with a second population of chromogenic 4-formyl-1,3-benzenedisulfonic acid (FBDSA) anions to promote real-time formation of ion/ion complexes at atmospheric pressure. Subsequent collisional activation of the ion/ion intermediate results in Schiff base formation generated via reaction between a primary amine in the peptide cation and the aldehyde moiety of the FBDSA anion. Utilizing 193 nm UVPD as the subsequent activation step in the MS(3) workflow results in acquisition of greater primary sequence information relative to conventional collision induced dissociation (CID). Furthermore, Schiff-base-modified peptides exhibit on average a 20% increase in UVPD efficiency compared to their unmodified counterparts. Due to the efficiency of covalent labeling achieved with the dual spray reactor, we demonstrate that this strategy can be integrated into a high-throughput LC-MS(n) workflow for rapid derivatization of peptide mixtures.

  6. Biologically Relevant Heterogeneity: Metrics and Practical Insights.

    PubMed

    Gough, Albert; Stern, Andrew M; Maier, John; Lezon, Timothy; Shun, Tong-Ying; Chennubhotla, Chakra; Schurdak, Mark E; Haney, Steven A; Taylor, D Lansing

    2017-03-01

    Heterogeneity is a fundamental property of biological systems at all scales that must be addressed in a wide range of biomedical applications, including basic biomedical research, drug discovery, diagnostics, and the implementation of precision medicine. There are a number of published approaches to characterizing heterogeneity in cells in vitro and in tissue sections. However, there are no generally accepted approaches for the detection and quantitation of heterogeneity that can be applied in a relatively high-throughput workflow. This review and perspective emphasizes the experimental methods that capture multiplexed cell-level data, as well as the need for standard metrics of the spatial, temporal, and population components of heterogeneity. A recommendation is made for the adoption of a set of three heterogeneity indices that can be implemented in any high-throughput workflow to optimize the decision-making process. In addition, a pairwise mutual information method is suggested as an approach to characterizing the spatial features of heterogeneity, especially in tissue-based imaging. Furthermore, metrics for temporal heterogeneity are in the early stages of development. Example studies indicate that the analysis of functional phenotypic heterogeneity can be exploited to guide decisions in the interpretation of biomedical experiments, drug discovery, diagnostics, and the design of optimal therapeutic strategies for individual patients.

  7. BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication.

    PubMed

    Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin

    2017-11-10

    The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  8. A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.

    PubMed

    Mirel, Barbara; Eichinger, Felix; Keller, Benjamin J; Kretzler, Matthias

    2011-03-21

    Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease? From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow. Our results imply that visualizations should make available to scientific users “bundles of features” consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.

  9. Proteomic Workflows for Biomarker Identification Using Mass Spectrometry — Technical and Statistical Considerations during Initial Discovery

    PubMed Central

    Orton, Dennis J.; Doucette, Alan A.

    2013-01-01

    Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400

  10. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.

    PubMed

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-03-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.

  11. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

    PubMed Central

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-01-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945

  12. High-throughput bioinformatics with the Cyrille2 pipeline system

    PubMed Central

    Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ

    2008-01-01

    Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742

  13. High-Throughput Sequencing and Metagenomics: Moving Forward in the Culture-Independent Analysis of Food Microbial Ecology

    PubMed Central

    2013-01-01

    Following recent trends in environmental microbiology, food microbiology has benefited from the advances in molecular biology and adopted novel strategies to detect, identify, and monitor microbes in food. An in-depth study of the microbial diversity in food can now be achieved by using high-throughput sequencing (HTS) approaches after direct nucleic acid extraction from the sample to be studied. In this review, the workflow of applying culture-independent HTS to food matrices is described. The current scenario and future perspectives of HTS uses to study food microbiota are presented, and the decision-making process leading to the best choice of working conditions to fulfill the specific needs of food research is described. PMID:23475615

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garzoglio, Gabriele

    The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center propose a joint project. The goals are to enable scientific workflows of stakeholders to run on multiple cloud resources by use of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federat ion of Cloud Resources , and (c) High-Throughput Fabric Virtualization. This is a matching fund project in which Fermilab and KISTI will contribute equal resources .

  15. A High-throughput Assay for mRNA Silencing in Primary Cortical Neurons in vitro with Oligonucleotide Therapeutics.

    PubMed

    Alterman, Julia F; Coles, Andrew H; Hall, Lauren M; Aronin, Neil; Khvorova, Anastasia; Didiot, Marie-Cécile

    2017-08-20

    Primary neurons represent an ideal cellular system for the identification of therapeutic oligonucleotides for the treatment of neurodegenerative diseases. However, due to the sensitive nature of primary cells, the transfection of small interfering RNAs (siRNA) using classical methods is laborious and often shows low efficiency. Recent progress in oligonucleotide chemistry has enabled the development of stabilized and hydrophobically modified small interfering RNAs (hsiRNAs). This new class of oligonucleotide therapeutics shows extremely efficient self-delivery properties and supports potent and durable effects in vitro and in vivo . We have developed a high-throughput in vitro assay to identify and test hsiRNAs in primary neuronal cultures. To simply, rapidly, and accurately quantify the mRNA silencing of hundreds of hsiRNAs, we use the QuantiGene 2.0 quantitative gene expression assay. This high-throughput, 96-well plate-based assay can quantify mRNA levels directly from sample lysate. Here, we describe a method to prepare short-term cultures of mouse primary cortical neurons in a 96-well plate format for high-throughput testing of oligonucleotide therapeutics. This method supports the testing of hsiRNA libraries and the identification of potential therapeutics within just two weeks. We detail methodologies of our high throughput assay workflow from primary neuron preparation to data analysis. This method can help identify oligonucleotide therapeutics for treatment of various neurological diseases.

  16. Support and Development of Workflow Protocols for High Throughput Single-Lap-Joint Testing-Experimental

    DTIC Science & Technology

    2013-04-01

    preparation, and presence of an overflow fillet for a high strength epoxy and ductile methacylate adhesive. A unique feature of this study was the...of expanding adhesive joint test configurations as part of the GEMS program. 15. SUBJECT TERMS single lap joint, adhesion, aluminum, epoxy ... epoxy and ductile methacylate adhesive. A unique feature of this study was the use of untrained GEMS (Gains in the Education of Mathematics and Sci

  17. A Fully Automated High-Throughput Flow Cytometry Screening System Enabling Phenotypic Drug Discovery.

    PubMed

    Joslin, John; Gilligan, James; Anderson, Paul; Garcia, Catherine; Sharif, Orzala; Hampton, Janice; Cohen, Steven; King, Miranda; Zhou, Bin; Jiang, Shumei; Trussell, Christopher; Dunn, Robert; Fathman, John W; Snead, Jennifer L; Boitano, Anthony E; Nguyen, Tommy; Conner, Michael; Cooke, Mike; Harris, Jennifer; Ainscow, Ed; Zhou, Yingyao; Shaw, Chris; Sipes, Dan; Mainquist, James; Lesley, Scott

    2018-05-01

    The goal of high-throughput screening is to enable screening of compound libraries in an automated manner to identify quality starting points for optimization. This often involves screening a large diversity of compounds in an assay that preserves a connection to the disease pathology. Phenotypic screening is a powerful tool for drug identification, in that assays can be run without prior understanding of the target and with primary cells that closely mimic the therapeutic setting. Advanced automation and high-content imaging have enabled many complex assays, but these are still relatively slow and low throughput. To address this limitation, we have developed an automated workflow that is dedicated to processing complex phenotypic assays for flow cytometry. The system can achieve a throughput of 50,000 wells per day, resulting in a fully automated platform that enables robust phenotypic drug discovery. Over the past 5 years, this screening system has been used for a variety of drug discovery programs, across many disease areas, with many molecules advancing quickly into preclinical development and into the clinic. This report will highlight a diversity of approaches that automated flow cytometry has enabled for phenotypic drug discovery.

  18. Biologically Relevant Heterogeneity: Metrics and Practical Insights

    PubMed Central

    Gough, A; Stern, AM; Maier, J; Lezon, T; Shun, T-Y; Chennubhotla, C; Schurdak, ME; Haney, SA; Taylor, DL

    2017-01-01

    Heterogeneity is a fundamental property of biological systems at all scales that must be addressed in a wide range of biomedical applications including basic biomedical research, drug discovery, diagnostics and the implementation of precision medicine. There are a number of published approaches to characterizing heterogeneity in cells in vitro and in tissue sections. However, there are no generally accepted approaches for the detection and quantitation of heterogeneity that can be applied in a relatively high throughput workflow. This review and perspective emphasizes the experimental methods that capture multiplexed cell level data, as well as the need for standard metrics of the spatial, temporal and population components of heterogeneity. A recommendation is made for the adoption of a set of three heterogeneity indices that can be implemented in any high throughput workflow to optimize the decision-making process. In addition, a pairwise mutual information method is suggested as an approach to characterizing the spatial features of heterogeneity, especially in tissue-based imaging. Furthermore, metrics for temporal heterogeneity are in the early stages of development. Example studies indicate that the analysis of functional phenotypic heterogeneity can be exploited to guide decisions in the interpretation of biomedical experiments, drug discovery, diagnostics and the design of optimal therapeutic strategies for individual patients. PMID:28231035

  19. High-throughput 96-well solvent mediated sonic blending synthesis and on-plate solid/solution stability characterization of pharmaceutical cocrystals.

    PubMed

    Luu, Van; Jona, Janan; Stanton, Mary K; Peterson, Matthew L; Morrison, Henry G; Nagapudi, Karthik; Tan, Helming

    2013-01-30

    A 96-well high-throughput cocrystal screening workflow has been developed consisting of solvent-mediated sonic blending synthesis and on-plate solid/solution stability characterization by XRPD. A strategy of cocrystallization screening in selected blend solvents including water mixtures is proposed to not only manipulate solubility of the cocrystal components but also differentiate physical stability of the cocrystal products. Caffeine-oxalic acid and theophylline-oxalic acid cocrystals were prepared and evaluated in relation to saturation levels of the cocrystal components and stability of the cocrystal products in anhydrous and hydrous solvents. AMG 517 was screened with a number of coformers, and solid/solution stability of the resulting cocrystals on the 96-well plate was investigated. A stability trend was observed and confirmed that cocrystals comprised of lower aqueous solubility coformers tended to be more stable in water. Furthermore, cocrystals which could be isolated under hydrous solvent blending condition exhibited superior physical stability to those which could only be obtained under anhydrous condition. This integrated HTS workflow provides an efficient route in an API-sparing approach to screen and identify cocrystal candidates with proper solubility and solid/solution stability properties. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  1. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duro, Francisco Rodrigo; Garcia Blas, Javier; Isaila, Florin

    This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.

  3. Solid-Phase Extraction Strategies to Surmount Body Fluid Sample Complexity in High-Throughput Mass Spectrometry-Based Proteomics

    PubMed Central

    Bladergroen, Marco R.; van der Burgt, Yuri E. M.

    2015-01-01

    For large-scale and standardized applications in mass spectrometry- (MS-) based proteomics automation of each step is essential. Here we present high-throughput sample preparation solutions for balancing the speed of current MS-acquisitions and the time needed for analytical workup of body fluids. The discussed workflows reduce body fluid sample complexity and apply for both bottom-up proteomics experiments and top-down protein characterization approaches. Various sample preparation methods that involve solid-phase extraction (SPE) including affinity enrichment strategies have been automated. Obtained peptide and protein fractions can be mass analyzed by direct infusion into an electrospray ionization (ESI) source or by means of matrix-assisted laser desorption ionization (MALDI) without further need of time-consuming liquid chromatography (LC) separations. PMID:25692071

  4. UCLA's Molecular Screening Shared Resource: enhancing small molecule discovery with functional genomics and new technology.

    PubMed

    Damoiseaux, Robert

    2014-05-01

    The Molecular Screening Shared Resource (MSSR) offers a comprehensive range of leading-edge high throughput screening (HTS) services including drug discovery, chemical and functional genomics, and novel methods for nano and environmental toxicology. The MSSR is an open access environment with investigators from UCLA as well as from the entire globe. Industrial clients are equally welcome as are non-profit entities. The MSSR is a fee-for-service entity and does not retain intellectual property. In conjunction with the Center for Environmental Implications of Nanotechnology, the MSSR is unique in its dedicated and ongoing efforts towards high throughput toxicity testing of nanomaterials. In addition, the MSSR engages in technology development eliminating bottlenecks from the HTS workflow and enabling novel assays and readouts currently not available.

  5. Improving clinical laboratory efficiency: a time-motion evaluation of the Abbott m2000 RealTime and Roche COBAS AmpliPrep/COBAS TaqMan PCR systems for the simultaneous quantitation of HIV-1 RNA and HCV RNA.

    PubMed

    Amendola, Alessandra; Coen, Sabrina; Belladonna, Stefano; Pulvirenti, F Renato; Clemens, John M; Capobianchi, M Rosaria

    2011-08-01

    Diagnostic laboratories need automation that facilitates efficient processing and workflow management to meet today's challenges for expanding services and reducing cost, yet maintaining the highest levels of quality. Processing efficiency of two commercially available automated systems for quantifying HIV-1 and HCV RNA, Abbott m2000 system and Roche COBAS Ampliprep/COBAS TaqMan 96 (docked) systems (CAP/CTM), was evaluated in a mid/high throughput workflow laboratory using a representative daily workload of 24 HCV and 72 HIV samples. Three test scenarios were evaluated: A) one run with four batches on the CAP/CTM system, B) two runs on the Abbott m2000 and C) one run using the Abbott m2000 maxCycle feature (maxCycle) for co-processing these assays. Cycle times for processing, throughput and hands-on time were evaluated. Overall processing cycle time was 10.3, 9.1 and 7.6 h for Scenarios A), B) and C), respectively. Total hands-on time for each scenario was, in order, 100.0 (A), 90.3 (B) and 61.4 min (C). The interface of an automated analyzer to the laboratory workflow, notably system set up for samples and reagents and clean up functions, are as important as the automation capability of the analyzer for the overall impact to processing efficiency and operator hands-on time.

  6. Xi-cam: a versatile interface for data visualization and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke

    Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less

  7. Xi-cam: a versatile interface for data visualization and analysis

    DOE PAGES

    Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke; ...

    2018-05-31

    Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less

  8. GlycoExtractor: a web-based interface for high throughput processing of HPLC-glycan data.

    PubMed

    Artemenko, Natalia V; Campbell, Matthew P; Rudd, Pauline M

    2010-04-05

    Recently, an automated high-throughput HPLC platform has been developed that can be used to fully sequence and quantify low concentrations of N-linked sugars released from glycoproteins, supported by an experimental database (GlycoBase) and analytical tools (autoGU). However, commercial packages that support the operation of HPLC instruments and data storage lack platforms for the extraction of large volumes of data. The lack of resources and agreed formats in glycomics is now a major limiting factor that restricts the development of bioinformatic tools and automated workflows for high-throughput HPLC data analysis. GlycoExtractor is a web-based tool that interfaces with a commercial HPLC database/software solution to facilitate the extraction of large volumes of processed glycan profile data (peak number, peak areas, and glucose unit values). The tool allows the user to export a series of sample sets to a set of file formats (XML, JSON, and CSV) rather than a collection of disconnected files. This approach not only reduces the amount of manual refinement required to export data into a suitable format for data analysis but also opens the field to new approaches for high-throughput data interpretation and storage, including biomarker discovery and validation and monitoring of online bioprocessing conditions for next generation biotherapeutics.

  9. Rapid analysis and exploration of fluorescence microscopy images.

    PubMed

    Pavie, Benjamin; Rajaram, Satwik; Ouyang, Austin; Altschuler, Jason M; Steininger, Robert J; Wu, Lani F; Altschuler, Steven J

    2014-03-19

    Despite rapid advances in high-throughput microscopy, quantitative image-based assays still pose significant challenges. While a variety of specialized image analysis tools are available, most traditional image-analysis-based workflows have steep learning curves (for fine tuning of analysis parameters) and result in long turnaround times between imaging and analysis. In particular, cell segmentation, the process of identifying individual cells in an image, is a major bottleneck in this regard. Here we present an alternate, cell-segmentation-free workflow based on PhenoRipper, an open-source software platform designed for the rapid analysis and exploration of microscopy images. The pipeline presented here is optimized for immunofluorescence microscopy images of cell cultures and requires minimal user intervention. Within half an hour, PhenoRipper can analyze data from a typical 96-well experiment and generate image profiles. Users can then visually explore their data, perform quality control on their experiment, ensure response to perturbations and check reproducibility of replicates. This facilitates a rapid feedback cycle between analysis and experiment, which is crucial during assay optimization. This protocol is useful not just as a first pass analysis for quality control, but also may be used as an end-to-end solution, especially for screening. The workflow described here scales to large data sets such as those generated by high-throughput screens, and has been shown to group experimental conditions by phenotype accurately over a wide range of biological systems. The PhenoBrowser interface provides an intuitive framework to explore the phenotypic space and relate image properties to biological annotations. Taken together, the protocol described here will lower the barriers to adopting quantitative analysis of image based screens.

  10. A case study for cloud based high throughput analysis of NGS data using the globus genomics system

    DOE PAGES

    Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...

    2015-01-01

    Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less

  11. A case study for cloud based high throughput analysis of NGS data using the globus genomics system

    PubMed Central

    Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha

    2014-01-01

    Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205

  12. Determination of equilibrium dissociation constants for recombinant antibodies by high-throughput affinity electrophoresis.

    PubMed

    Pan, Yuchen; Sackmann, Eric K; Wypisniak, Karolina; Hornsby, Michael; Datwani, Sammy S; Herr, Amy E

    2016-12-23

    High-quality immunoreagents enhance the performance and reproducibility of immunoassays and, in turn, the quality of both biological and clinical measurements. High quality recombinant immunoreagents are generated using antibody-phage display. One metric of antibody quality - the binding affinity - is quantified through the dissociation constant (K D ) of each recombinant antibody and the target antigen. To characterize the K D of recombinant antibodies and target antigen, we introduce affinity electrophoretic mobility shift assays (EMSAs) in a high-throughput format suitable for small volume samples. A microfluidic card comprised of free-standing polyacrylamide gel (fsPAG) separation lanes supports 384 concurrent EMSAs in 30 s using a single power source. Sample is dispensed onto the microfluidic EMSA card by acoustic droplet ejection (ADE), which reduces EMSA variability compared to sample dispensing using manual or pin tools. The K D for each of a six-member fragment antigen-binding fragment library is reported using ~25-fold less sample mass and ~5-fold less time than conventional heterogeneous assays. Given the form factor and performance of this micro- and mesofluidic workflow, we have developed a sample-sparing, high-throughput, solution-phase alternative for biomolecular affinity characterization.

  13. Determination of equilibrium dissociation constants for recombinant antibodies by high-throughput affinity electrophoresis

    PubMed Central

    Pan, Yuchen; Sackmann, Eric K.; Wypisniak, Karolina; Hornsby, Michael; Datwani, Sammy S.; Herr, Amy E.

    2016-01-01

    High-quality immunoreagents enhance the performance and reproducibility of immunoassays and, in turn, the quality of both biological and clinical measurements. High quality recombinant immunoreagents are generated using antibody-phage display. One metric of antibody quality – the binding affinity – is quantified through the dissociation constant (KD) of each recombinant antibody and the target antigen. To characterize the KD of recombinant antibodies and target antigen, we introduce affinity electrophoretic mobility shift assays (EMSAs) in a high-throughput format suitable for small volume samples. A microfluidic card comprised of free-standing polyacrylamide gel (fsPAG) separation lanes supports 384 concurrent EMSAs in 30 s using a single power source. Sample is dispensed onto the microfluidic EMSA card by acoustic droplet ejection (ADE), which reduces EMSA variability compared to sample dispensing using manual or pin tools. The KD for each of a six-member fragment antigen-binding fragment library is reported using ~25-fold less sample mass and ~5-fold less time than conventional heterogeneous assays. Given the form factor and performance of this micro- and mesofluidic workflow, we have developed a sample-sparing, high-throughput, solution-phase alternative for biomolecular affinity characterization. PMID:28008969

  14. Efficient high-throughput biological process characterization: Definitive screening design with the ambr250 bioreactor system.

    PubMed

    Tai, Mitchell; Ly, Amanda; Leung, Inne; Nayar, Gautam

    2015-01-01

    The burgeoning pipeline for new biologic drugs has increased the need for high-throughput process characterization to efficiently use process development resources. Breakthroughs in highly automated and parallelized upstream process development have led to technologies such as the 250-mL automated mini bioreactor (ambr250™) system. Furthermore, developments in modern design of experiments (DoE) have promoted the use of definitive screening design (DSD) as an efficient method to combine factor screening and characterization. Here we utilize the 24-bioreactor ambr250™ system with 10-factor DSD to demonstrate a systematic experimental workflow to efficiently characterize an Escherichia coli (E. coli) fermentation process for recombinant protein production. The generated process model is further validated by laboratory-scale experiments and shows how the strategy is useful for quality by design (QbD) approaches to control strategies for late-stage characterization. © 2015 American Institute of Chemical Engineers.

  15. Comparison of manual and automated AmpliSeq™ workflows in the typing of a Somali population with the Precision ID Identity Panel.

    PubMed

    van der Heijden, Suzanne; de Oliveira, Susanne Juel; Kampmann, Marie-Louise; Børsting, Claus; Morling, Niels

    2017-11-01

    The Precision ID Identity Panel was used to type 109 Somali individuals in order to obtain allele frequencies for the Somali population. These frequencies were used to establish a Somali HID-SNP database, which will be used for the biostatistic calculations in family and immigration cases. Genotypes obtained with the Precision ID Identity Panel were found to be almost in complete concordance with genotypes obtained with the SNPforID PCR-SBE-CE assay. In seven SNP loci, silent alleles were identified, of which most were previously described in the literature. The project also set out to compare different AmpliSeq™ workflows to investigate the possibility of using automated library building in forensic genetic case work. In order to do so, the SNP typing of the Somalis was performed using three different workflows: 1) manual library building and sequencing on the Ion PGM™, 2) automated library building using the Biomek ® 3000 and sequencing on the Ion PGM™, and 3) automated library building using the Ion Chef™ and sequencing on the Ion S5™. AmpliSeq™ workflows were compared based on coverage, locus balance, noise, and heterozygote balance. Overall, the Ion Chef™/Ion S5™ workflow was found to give the best results and required least hands-on time in the laboratory. However, the Ion Chef™/Ion S5™ workflow was also the most expensive. The number of libraries that may be constructed in one Ion Chef™ library building run was limited to eight, which is too little for high throughput workflows. The Biomek ® 3000/Ion PGM™ workflow was found to perform similarly to the manual/Ion PGM™ workflow. This argues for the use of automated library building in forensic genetic case work. Automated library building decreases the workload of the laboratory staff, decreases the risk of pipetting errors, and simplifies the daily workflow in forensic genetic laboratories. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Computational methods for evaluation of cell-based data assessment--Bioconductor.

    PubMed

    Le Meur, Nolwenn

    2013-02-01

    Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun

    Motivation: Drift tube ion mobility spectrometry (DTIMS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS coupled with mass spectrometry and compute their associated collisional cross sections (CCS) we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of information necessary to create a reference library containing accu-rate masses, DTIMS arrival times and CCSs for use in high throughput omics analyses. Results: We demonstrate the utility of this approach bymore » automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were identical to those calculated by hand and within error of those calcu-lated using commercially available instrument vendor software.« less

  18. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

    PubMed Central

    Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan

    2016-01-01

    Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox,  TCGAbiolinks. PMID:28232861

  19. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.

    PubMed

    Silva, Tiago C; Colaprico, Antonio; Olsen, Catharina; D'Angelo, Fulvio; Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan

    2016-01-01

    Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox,  TCGAbiolinks.

  20. RNA isolation from mammalian cells using porous polymer monoliths: an approach for high-throughput automation.

    PubMed

    Chatterjee, Anirban; Mirer, Paul L; Zaldivar Santamaria, Elvira; Klapperich, Catherine; Sharon, Andre; Sauer-Budge, Alexis F

    2010-06-01

    The life science and healthcare communities have been redefining the importance of ribonucleic acid (RNA) through the study of small molecule RNA (in RNAi/siRNA technologies), micro RNA (in cancer research and stem cell research), and mRNA (gene expression analysis for biologic drug targets). Research in this field increasingly requires efficient and high-throughput isolation techniques for RNA. Currently, several commercial kits are available for isolating RNA from cells. Although the quality and quantity of RNA yielded from these kits is sufficiently good for many purposes, limitations exist in terms of extraction efficiency from small cell populations and the ability to automate the extraction process. Traditionally, automating a process decreases the cost and personnel time while simultaneously increasing the throughput and reproducibility. As the RNA field matures, new methods for automating its extraction, especially from low cell numbers and in high throughput, are needed to achieve these improvements. The technology presented in this article is a step toward this goal. The method is based on a solid-phase extraction technology using a porous polymer monolith (PPM). A novel cell lysis approach and a larger binding surface throughout the PPM extraction column ensure a high yield from small starting samples, increasing sensitivity and reducing indirect costs in cell culture and sample storage. The method ensures a fast and simple procedure for RNA isolation from eukaryotic cells, with a high yield both in terms of quality and quantity. The technique is amenable to automation and streamlined workflow integration, with possible miniaturization of the sample handling process making it suitable for high-throughput applications.

  1. Dashboard visualizations: Supporting real-time throughput decision-making.

    PubMed

    Franklin, Amy; Gantela, Swaroop; Shifarraw, Salsawit; Johnson, Todd R; Robinson, David J; King, Brent R; Mehta, Amit M; Maddow, Charles L; Hoot, Nathan R; Nguyen, Vickie; Rubio, Adriana; Zhang, Jiajie; Okafor, Nnaemeka G

    2017-07-01

    Providing timely and effective care in the emergency department (ED) requires the management of individual patients as well as the flow and demands of the entire department. Strategic changes to work processes, such as adding a flow coordination nurse or a physician in triage, have demonstrated improvements in throughput times. However, such global strategic changes do not address the real-time, often opportunistic workflow decisions of individual clinicians in the ED. We believe that real-time representation of the status of the entire emergency department and each patient within it through information visualizations will better support clinical decision-making in-the-moment and provide for rapid intervention to improve ED flow. This notion is based on previous work where we found that clinicians' workflow decisions were often based on an in-the-moment local perspective, rather than a global perspective. Here, we discuss the challenges of designing and implementing visualizations for ED through a discussion of the development of our prototype Throughput Dashboard and the potential it holds for supporting real-time decision-making. Copyright © 2017. Published by Elsevier Inc.

  2. First installation of a dual-room IVR-CT system in the emergency room.

    PubMed

    Wada, Daiki; Nakamori, Yasushi; Kanayama, Shuji; Maruyama, Shuhei; Kawada, Masahiro; Iwamura, Hiromu; Hayakawa, Koichi; Saito, Fukuki; Kuwagata, Yasuyuki

    2018-03-05

    Computed tomography (CT) embedded in the emergency room has gained importance in the early diagnostic phase of trauma care. In 2011, we implemented a new trauma workflow concept with a sliding CT scanner system with interventional radiology features (IVR-CT) that allows CT examination and emergency therapeutic intervention without relocating the patient, which we call the Hybrid emergency room (Hybrid ER). In the Hybrid ER, all life-saving procedures, CT examination, damage control surgery, and transcatheter arterial embolisation can be performed on the same table. Although the trauma workflow realized in the Hybrid ER may improve mortality in severe trauma, the Hybrid ER can potentially affect the efficacy of other in/outpatient diagnostic workflow because one room is occupied by one severely injured patient undergoing both emergency trauma care and CT scanning for long periods. In July 2017, we implemented a new trauma workflow concept with a dual-room sliding CT scanner system with interventional radiology features (dual-room IVR-CT) to increase patient throughput. When we perform emergency surgery or interventional radiology for a severely injured or ill patient in the Hybrid ER, the sliding CT scanner moves to the adjacent CT suite, and we can perform CT scanning of another in/outpatient. We believe that dual-room IVR-CT can contribute to the improvement of both the survival of severely injured or ill patients and patient throughput.

  3. Commercialization of microfluidic devices.

    PubMed

    Volpatti, Lisa R; Yetisen, Ali K

    2014-07-01

    Microfluidic devices offer automation and high-throughput screening, and operate at low volumes of consumables. Although microfluidics has the potential to reduce turnaround times and costs for analytical devices, particularly in medical, veterinary, and environmental sciences, this enabling technology has had limited diffusion into consumer products. This article analyzes the microfluidics market, identifies issues, and highlights successful commercialization strategies. Addressing niche markets and establishing compatibility with existing workflows will accelerate market penetration. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Optimization of subculture and DNA extraction steps within the whole genome sequencing workflow for source tracking of Salmonella enterica and Listeria monocytogenes.

    PubMed

    Gimonet, Johan; Portmann, Anne-Catherine; Fournier, Coralie; Baert, Leen

    2018-06-16

    This work shows that an incubation time reduced to 4-5 h to prepare a culture for DNA extraction followed by an automated DNA extraction can shorten the hands-on time, the turnaround time by 30% and increase the throughput while maintaining the WGS quality assessed by high quality Single Nucleotide Polymorphism analysis. Copyright © 2018. Published by Elsevier B.V.

  5. HTAPP: High-Throughput Autonomous Proteomic Pipeline

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2011-01-01

    Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676

  6. iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis

    PubMed Central

    2009-01-01

    Background In recent years, the genome biology community has expended considerable effort to confront the challenges of managing heterogeneous data in a structured and organized way and developed laboratory information management systems (LIMS) for both raw and processed data. On the other hand, electronic notebooks were developed to record and manage scientific data, and facilitate data-sharing. Software which enables both, management of large datasets and digital recording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques. Results We have developed iLAP (Laboratory data management, Analysis, and Protocol development), a workflow-driven information management system specifically designed to create and manage experimental protocols, and to analyze and share laboratory data. The system combines experimental protocol development, wizard-based data acquisition, and high-throughput data analysis into a single, integrated system. We demonstrate the power and the flexibility of the platform using a microscopy case study based on a combinatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under the open source license AGPL from http://genome.tugraz.at/iLAP/. Conclusion iLAP is a flexible and versatile information management system, which has the potential to close the gap between electronic notebooks and LIMS and can therefore be of great value for a broad scientific community. PMID:19941647

  7. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association

    PubMed Central

    Ma, Jian; Casey, Cameron P.; Zheng, Xueyun; Ibrahim, Yehia M.; Wilkins, Christopher S.; Renslow, Ryan S.; Thomas, Dennis G.; Payne, Samuel H.; Monroe, Matthew E.; Smith, Richard D.; Teeguarden, Justin G.; Baker, Erin S.; Metz, Thomas O.

    2017-01-01

    Abstract Motivation: Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. Results: We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. Availability and implementation: PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library. Contact: erin.baker@pnnl.gov or thomas.metz@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28505286

  8. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association.

    PubMed

    Ma, Jian; Casey, Cameron P; Zheng, Xueyun; Ibrahim, Yehia M; Wilkins, Christopher S; Renslow, Ryan S; Thomas, Dennis G; Payne, Samuel H; Monroe, Matthew E; Smith, Richard D; Teeguarden, Justin G; Baker, Erin S; Metz, Thomas O

    2017-09-01

    Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library . erin.baker@pnnl.gov or thomas.metz@pnnl.gov. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  9. A Metric and Workflow for Quality Control in the Analysis of Heterogeneity in Phenotypic Profiles and Screens

    PubMed Central

    Gough, Albert; Shun, Tongying; Taylor, D. Lansing; Schurdak, Mark

    2016-01-01

    Heterogeneity is well recognized as a common property of cellular systems that impacts biomedical research and the development of therapeutics and diagnostics. Several studies have shown that analysis of heterogeneity: gives insight into mechanisms of action of perturbagens; can be used to predict optimal combination therapies; and to quantify heterogeneity in tumors where heterogeneity is believed to be associated with adaptation and resistance. Cytometry methods including high content screening (HCS), high throughput microscopy, flow cytometry, mass spec imaging and digital pathology capture cell level data for populations of cells. However it is often assumed that the population response is normally distributed and therefore that the average adequately describes the results. A deeper understanding of the results of the measurements and more effective comparison of perturbagen effects requires analysis that takes into account the distribution of the measurements, i.e. the heterogeneity. However, the reproducibility of heterogeneous data collected on different days, and in different plates/slides has not previously been evaluated. Here we show that conventional assay quality metrics alone are not adequate for quality control of the heterogeneity in the data. To address this need, we demonstrate the use of the Kolmogorov-Smirnov statistic as a metric for monitoring the reproducibility of heterogeneity in an SAR screen, describe a workflow for quality control in heterogeneity analysis. One major challenge in high throughput biology is the evaluation and interpretation of heterogeneity in thousands of samples, such as compounds in a cell-based screen. In this study we also demonstrate that three heterogeneity indices previously reported, capture the shapes of the distributions and provide a means to filter and browse big data sets of cellular distributions in order to compare and identify distributions of interest. These metrics and methods are presented as a workflow for analysis of heterogeneity in large scale biology projects. PMID:26476369

  10. Recent advances in high-throughput QCL-based infrared microspectral imaging (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Rowlette, Jeremy A.; Fotheringham, Edeline; Nichols, David; Weida, Miles J.; Kane, Justin; Priest, Allen; Arnone, David B.; Bird, Benjamin; Chapman, William B.; Caffey, David B.; Larson, Paul; Day, Timothy

    2017-02-01

    The field of infrared spectral imaging and microscopy is advancing rapidly due in large measure to the recent commercialization of the first high-throughput, high-spatial-definition quantum cascade laser (QCL) microscope. Having speed, resolution and noise performance advantages while also eliminating the need for cryogenic cooling, its introduction has established a clear path to translating the well-established diagnostic capability of infrared spectroscopy into clinical and pre-clinical histology, cytology and hematology workflows. Demand for even higher throughput while maintaining high-spectral fidelity and low-noise performance continues to drive innovation in QCL-based spectral imaging instrumentation. In this talk, we will present for the first time, recent technological advances in tunable QCL photonics which have led to an additional 10X enhancement in spectral image data collection speed while preserving the high spectral fidelity and SNR exhibited by the first generation of QCL microscopes. This new approach continues to leverage the benefits of uncooled microbolometer focal plane array cameras, which we find to be essential for ensuring both reproducibility of data across instruments and achieving the high-reliability needed in clinical applications. We will discuss the physics underlying these technological advancements as well as the new biomedical applications these advancements are enabling, including automated whole-slide infrared chemical imaging on clinically relevant timescales.

  11. High Resolution Melting (HRM) for High-Throughput Genotyping-Limitations and Caveats in Practical Case Studies.

    PubMed

    Słomka, Marcin; Sobalska-Kwapis, Marta; Wachulec, Monika; Bartosz, Grzegorz; Strapagiel, Dominik

    2017-11-03

    High resolution melting (HRM) is a convenient method for gene scanning as well as genotyping of individual and multiple single nucleotide polymorphisms (SNPs). This rapid, simple, closed-tube, homogenous, and cost-efficient approach has the capacity for high specificity and sensitivity, while allowing easy transition to high-throughput scale. In this paper, we provide examples from our laboratory practice of some problematic issues which can affect the performance and data analysis of HRM results, especially with regard to reference curve-based targeted genotyping. We present those examples in order of the typical experimental workflow, and discuss the crucial significance of the respective experimental errors and limitations for the quality and analysis of results. The experimental details which have a decisive impact on correct execution of a HRM genotyping experiment include type and quality of DNA source material, reproducibility of isolation method and template DNA preparation, primer and amplicon design, automation-derived preparation and pipetting inconsistencies, as well as physical limitations in melting curve distinction for alternative variants and careful selection of samples for validation by sequencing. We provide a case-by-case analysis and discussion of actual problems we encountered and solutions that should be taken into account by researchers newly attempting HRM genotyping, especially in a high-throughput setup.

  12. High Resolution Melting (HRM) for High-Throughput Genotyping—Limitations and Caveats in Practical Case Studies

    PubMed Central

    Słomka, Marcin; Sobalska-Kwapis, Marta; Wachulec, Monika; Bartosz, Grzegorz

    2017-01-01

    High resolution melting (HRM) is a convenient method for gene scanning as well as genotyping of individual and multiple single nucleotide polymorphisms (SNPs). This rapid, simple, closed-tube, homogenous, and cost-efficient approach has the capacity for high specificity and sensitivity, while allowing easy transition to high-throughput scale. In this paper, we provide examples from our laboratory practice of some problematic issues which can affect the performance and data analysis of HRM results, especially with regard to reference curve-based targeted genotyping. We present those examples in order of the typical experimental workflow, and discuss the crucial significance of the respective experimental errors and limitations for the quality and analysis of results. The experimental details which have a decisive impact on correct execution of a HRM genotyping experiment include type and quality of DNA source material, reproducibility of isolation method and template DNA preparation, primer and amplicon design, automation-derived preparation and pipetting inconsistencies, as well as physical limitations in melting curve distinction for alternative variants and careful selection of samples for validation by sequencing. We provide a case-by-case analysis and discussion of actual problems we encountered and solutions that should be taken into account by researchers newly attempting HRM genotyping, especially in a high-throughput setup. PMID:29099791

  13. KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.

    PubMed

    Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert

    2017-05-15

    Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.

  14. A workflow to investigate exposure and pharmacokinetic ...

    EPA Pesticide Factsheets

    Adverse outcome pathways (AOP) link known population outcomes to a molecular initiating event (MIE) that can be quantified using high-throughput in vitro methods. Practical application of AOPs in chemical-specific risk assessment requires consideration of exposure and absorption, distribution, metabolism, excretion (ADME) properties of chemicals. We developed a conceptual workflow to consider exposure and ADME properties in relationship to an MIE and demonstrated the utility of this workflow using a previously established AOP, acetylcholinesterase (AChE) inhibition. Thirty active chemicals found to inhibit AChE in the ToxCastTM assay were examined with respect to their exposure and absorption potentials, and their ability to cross the blood-brain barrier. Structural similarities of active compounds were compared against structures of inactive compounds to detect possible non-active parents that might have active metabolites. Fifty-two of the 1,029 inactive compounds exhibited a similarity threshold above 75% with their nearest active neighbors. Excluding compounds that may not be absorbed, 22 could be potentially toxic following metabolism. The incorporation of exposure and ADME properties into the conceptual workflow resulted in prioritization of 20 out of 30 active compounds identified in an AChE inhibition assay for further analysis, along with identification of several inactive parent compounds of active metabolites. This qualitative approach can minimize co

  15. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  16. pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification.

    PubMed

    Liu, Ming-Qi; Zeng, Wen-Feng; Fang, Pan; Cao, Wei-Qian; Liu, Chao; Yan, Guo-Quan; Zhang, Yang; Peng, Chao; Wu, Jian-Qiang; Zhang, Xiao-Jin; Tu, Hui-Jun; Chi, Hao; Sun, Rui-Xiang; Cao, Yong; Dong, Meng-Qiu; Jiang, Bi-Yun; Huang, Jiang-Ming; Shen, Hua-Li; Wong, Catherine C L; He, Si-Min; Yang, Peng-Yuan

    2017-09-05

    The precise and large-scale identification of intact glycopeptides is a critical step in glycoproteomics. Owing to the complexity of glycosylation, the current overall throughput, data quality and accessibility of intact glycopeptide identification lack behind those in routine proteomic analyses. Here, we propose a workflow for the precise high-throughput identification of intact N-glycopeptides at the proteome scale using stepped-energy fragmentation and a dedicated search engine. pGlyco 2.0 conducts comprehensive quality control including false discovery rate evaluation at all three levels of matches to glycans, peptides and glycopeptides, improving the current level of accuracy of intact glycopeptide identification. The N-glycoproteome of samples metabolically labeled with 15 N/ 13 C were analyzed quantitatively and utilized to validate the glycopeptide identification, which could be used as a novel benchmark pipeline to compare different search engines. Finally, we report a large-scale glycoproteome dataset consisting of 10,009 distinct site-specific N-glycans on 1988 glycosylation sites from 955 glycoproteins in five mouse tissues.Protein glycosylation is a heterogeneous post-translational modification that generates greater proteomic diversity that is difficult to analyze. Here the authors describe pGlyco 2.0, a workflow for the precise one step identification of intact N-glycopeptides at the proteome scale.

  17. DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

    PubMed

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-08-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

  18. DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

    PubMed Central

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-01-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089

  19. Closha: bioinformatics workflow system for the analysis of massive sequencing data.

    PubMed

    Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

    2018-02-19

    While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .

  20. Text mining for the biocuration workflow.

    PubMed

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  1. Analog to digital workflow improvement: a quantitative study.

    PubMed

    Wideman, Catherine; Gallet, Jacqueline

    2006-01-01

    This study tracked a radiology department's conversion from utilization of a Kodak Amber analog system to a Kodak DirectView DR 5100 digital system. Through the use of ProModel Optimization Suite, a workflow simulation software package, significant quantitative information was derived from workflow process data measured before and after the change to a digital system. Once the digital room was fully operational and the radiology staff comfortable with the new system, average patient examination time was reduced from 9.24 to 5.28 min, indicating that a higher patient throughput could be achieved. Compared to the analog system, chest examination time for modality specific activities was reduced by 43%. The percentage of repeat examinations experienced with the digital system also decreased to 8% vs. the level of 9.5% experienced with the analog system. The study indicated that it is possible to quantitatively study clinical workflow and productivity by using commercially available software.

  2. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data

    PubMed Central

    Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2016-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381

  3. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

    PubMed

    Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2015-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.

  4. The Hemiptera (Insecta) of Canada: Constructing a Reference Library of DNA Barcodes

    PubMed Central

    Gwiazdowski, Rodger A.; Foottit, Robert G.; Maw, H. Eric L.; Hebert, Paul D. N.

    2015-01-01

    DNA barcode reference libraries linked to voucher specimens create new opportunities for high-throughput identification and taxonomic re-evaluations. This study provides a DNA barcode library for about 45% of the recognized species of Canadian Hemiptera, and the publically available R workflow used for its generation. The current library is based on the analysis of 20,851 specimens including 1849 species belonging to 628 genera and 64 families. These individuals were assigned to 1867 Barcode Index Numbers (BINs), sequence clusters that often coincide with species recognized through prior taxonomy. Museum collections were a key source for identified specimens, but we also employed high-throughput collection methods that generated large numbers of unidentified specimens. Many of these specimens represented novel BINs that were subsequently identified by taxonomists, adding barcode coverage for additional species. Our analyses based on both approaches includes 94 species not listed in the most recent Canadian checklist, representing a potential 3% increase in the fauna. We discuss the development of our workflow in the context of prior DNA barcode library construction projects, emphasizing the importance of delineating a set of reference specimens to aid investigations in cases of nomenclatural and DNA barcode discordance. The identification for each specimen in the reference set can be annotated on the Barcode of Life Data System (BOLD), allowing experts to highlight questionable identifications; annotations can be added by any registered user of BOLD, and instructions for this are provided. PMID:25923328

  5. CellSegm - a MATLAB toolbox for high-throughput 3D cell segmentation

    PubMed Central

    2013-01-01

    The application of fluorescence microscopy in cell biology often generates a huge amount of imaging data. Automated whole cell segmentation of such data enables the detection and analysis of individual cells, where a manual delineation is often time consuming, or practically not feasible. Furthermore, compared to manual analysis, automation normally has a higher degree of reproducibility. CellSegm, the software presented in this work, is a Matlab based command line software toolbox providing an automated whole cell segmentation of images showing surface stained cells, acquired by fluorescence microscopy. It has options for both fully automated and semi-automated cell segmentation. Major algorithmic steps are: (i) smoothing, (ii) Hessian-based ridge enhancement, (iii) marker-controlled watershed segmentation, and (iv) feature-based classfication of cell candidates. Using a wide selection of image recordings and code snippets, we demonstrate that CellSegm has the ability to detect various types of surface stained cells in 3D. After detection and outlining of individual cells, the cell candidates can be subject to software based analysis, specified and programmed by the end-user, or they can be analyzed by other software tools. A segmentation of tissue samples with appropriate characteristics is also shown to be resolvable in CellSegm. The command-line interface of CellSegm facilitates scripting of the separate tools, all implemented in Matlab, offering a high degree of flexibility and tailored workflows for the end-user. The modularity and scripting capabilities of CellSegm enable automated workflows and quantitative analysis of microscopic data, suited for high-throughput image based screening. PMID:23938087

  6. CellSegm - a MATLAB toolbox for high-throughput 3D cell segmentation.

    PubMed

    Hodneland, Erlend; Kögel, Tanja; Frei, Dominik Michael; Gerdes, Hans-Hermann; Lundervold, Arvid

    2013-08-09

    : The application of fluorescence microscopy in cell biology often generates a huge amount of imaging data. Automated whole cell segmentation of such data enables the detection and analysis of individual cells, where a manual delineation is often time consuming, or practically not feasible. Furthermore, compared to manual analysis, automation normally has a higher degree of reproducibility. CellSegm, the software presented in this work, is a Matlab based command line software toolbox providing an automated whole cell segmentation of images showing surface stained cells, acquired by fluorescence microscopy. It has options for both fully automated and semi-automated cell segmentation. Major algorithmic steps are: (i) smoothing, (ii) Hessian-based ridge enhancement, (iii) marker-controlled watershed segmentation, and (iv) feature-based classfication of cell candidates. Using a wide selection of image recordings and code snippets, we demonstrate that CellSegm has the ability to detect various types of surface stained cells in 3D. After detection and outlining of individual cells, the cell candidates can be subject to software based analysis, specified and programmed by the end-user, or they can be analyzed by other software tools. A segmentation of tissue samples with appropriate characteristics is also shown to be resolvable in CellSegm. The command-line interface of CellSegm facilitates scripting of the separate tools, all implemented in Matlab, offering a high degree of flexibility and tailored workflows for the end-user. The modularity and scripting capabilities of CellSegm enable automated workflows and quantitative analysis of microscopic data, suited for high-throughput image based screening.

  7. Electron capture dissociation in a branched radio-frequency ion trap.

    PubMed

    Baba, Takashi; Campbell, J Larry; Le Blanc, J C Yves; Hager, James W; Thomson, Bruce A

    2015-01-06

    We have developed a high-throughput electron capture dissociation (ECD) device coupled to a quadrupole time-of-flight mass spectrometer using novel branched radio frequency ion trap architecture. With this device, a low-energy electron beam can be injected orthogonally into the analytical ion beam with independent control of both the ion and electron beams. While ions and electrons can interact in a "flow-through" mode, we observed a large enhancement in ECD efficiency by introducing a short ion trapping period at the region of ion and electron beam intersection. This simultaneous trapping mode still provides up to five ECD spectra per second while operating in an information-dependent acquisition workflow. Coupled to liquid chromatography (LC), this LC-ECD workflow provides good sequence coverage for both trypsin and Lys C digests of bovine serum albumin, providing ECD spectra for doubly charged precursor ions with very good efficiency.

  8. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow.

    PubMed

    Kirwan, J A; Broadhurst, D I; Davidson, R L; Viant, M R

    2013-06-01

    Direct infusion mass spectrometry (DIMS)-based untargeted metabolomics measures many hundreds of metabolites in a single experiment. While every effort is made to reduce within-experiment analytical variation in untargeted metabolomics, unavoidable sources of measurement error are introduced. This is particularly true for large-scale multi-batch experiments, necessitating the development of robust workflows that minimise batch-to-batch variation. Here, we conducted a purpose-designed, eight-batch DIMS metabolomics study using nanoelectrospray (nESI) Fourier transform ion cyclotron resonance mass spectrometric analyses of mammalian heart extracts. First, we characterised the intrinsic analytical variation of this approach to determine whether our existing workflows are fit for purpose when applied to a multi-batch investigation. Batch-to-batch variation was readily observed across the 7-day experiment, both in terms of its absolute measurement using quality control (QC) and biological replicate samples, as well as its adverse impact on our ability to discover significant metabolic information within the data. Subsequently, we developed and implemented a computational workflow that includes total-ion-current filtering, QC-robust spline batch correction and spectral cleaning, and provide conclusive evidence that this workflow reduces analytical variation and increases the proportion of significant peaks. We report an overall analytical precision of 15.9%, measured as the median relative standard deviation (RSD) for the technical replicates of the biological samples, across eight batches and 7 days of measurements. When compared against the FDA guidelines for biomarker studies, which specify an RSD of <20% as an acceptable level of precision, we conclude that our new workflows are fit for purpose for large-scale, high-throughput nESI DIMS metabolomics studies.

  9. qPortal: A platform for data-driven biomedical research.

    PubMed

    Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

    2018-01-01

    Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.

  10. Evaluation of the NanoCHIP® Gastrointestinal Panel (GIP) Test for Simultaneous Detection of Parasitic and Bacterial Enteric Pathogens in Fecal Specimens

    PubMed Central

    Ken Dror, Shifra; Pavlotzky, Elsa; Barak, Mira

    2016-01-01

    Infectious gastroenteritis is a global health problem associated with high morbidity and mortality rates. Rapid and accurate diagnosis is crucial to allow appropriate and timely treatment. Current laboratory stool testing has a long turnaround time (TAT) and demands highly qualified personnel and multiple techniques. The need for high throughput and the number of possible enteric pathogens compels the implementation of a molecular approach which uses multiplex technology, without compromising performance requirements. In this work we evaluated the feasibility of the NanoCHIP® Gastrointestinal Panel (GIP) (Savyon Diagnostics, Ashdod, IL), a molecular microarray-based screening test, to be used in the routine workflow of our laboratory, a big outpatient microbiology laboratory. The NanoCHIP® GIP test provides simultaneous detection of nine major enteric bacteria and parasites: Campylobacter spp., Salmonella spp., Shigella spp., Giardia sp., Cryptosporidium spp., Entamoeba histolytica, Entamoeba dispar, Dientamoeba fragilis, and Blastocystis spp. The required high-throughput was obtained by the NanoCHIP® detection system together with the MagNA Pure 96 DNA purification system (Roche Diagnostics Ltd., Switzerland). This combined system has demonstrated a higher sensitivity and detection yield compared to the conventional methods in both, retrospective and prospective samples. The identification of multiple parasites and bacteria in a single test also enabled increased efficiency of detecting mixed infections, as well as reduced hands-on time and work load. In conclusion, the combination of these two automated systems is a proper response to the laboratory needs in terms of improving laboratory workflow, turn-around-time, minimizing human errors and can be efficiently integrated in the routine work of the laboratory. PMID:27447173

  11. Initial steps towards a production platform for DNA sequence analysis on the grid.

    PubMed

    Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D

    2010-12-14

    Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/

  12. Spatially-Resolved Proteomics: Rapid Quantitative Analysis of Laser Capture Microdissected Alveolar Tissue Samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clair, Geremy; Piehowski, Paul D.; Nicola, Teodora

    Global proteomics approaches allow characterization of whole tissue lysates to an impressive depth. However, it is now increasingly recognized that to better understand the complexity of multicellular organisms, global protein profiling of specific spatially defined regions/substructures of tissues (i.e. spatially-resolved proteomics) is essential. Laser capture microdissection (LCM) enables microscopic isolation of defined regions of tissues preserving crucial spatial information. However, current proteomics workflows entail several manual sample preparation steps and are challenged by the microscopic mass-limited samples generated by LCM, and that impact measurement robustness, quantification, and throughput. Here, we coupled LCM with a fully automated sample preparation workflow thatmore » with a single manual step allows: protein extraction, tryptic digestion, peptide cleanup and LC-MS/MS analysis of proteomes from microdissected tissues. Benchmarking against the current state of the art in ultrasensitive global proteomic analysis, our approach demonstrated significant improvements in quantification and throughput. Using our LCM-SNaPP proteomics approach, we characterized to a depth of more than 3,400 proteins, the ontogeny of protein changes during normal lung development in laser capture microdissected alveolar tissue containing ~4,000 cells per sample. Importantly, the data revealed quantitative changes for 350 low abundance transcription factors and signaling molecules, confirming earlier transcript-level observations and defining seven modules of coordinated transcription factor/signaling molecule expression patterns, suggesting that a complex network of temporal regulatory control directs normal lung development with epigenetic regulation fine-tuning pre-natal developmental processes. Our LCM-proteomics approach facilitates efficient, spatially-resolved, ultrasensitive global proteomics analyses in high-throughput that will be enabling for several clinical and biological applications.« less

  13. A real-time phenotyping framework using machine learning for plant stress severity rating in soybean.

    PubMed

    Naik, Hsiang Sing; Zhang, Jiaoping; Lofquist, Alec; Assefa, Teshale; Sarkar, Soumik; Ackerman, David; Singh, Arti; Singh, Asheesh K; Ganapathysubramanian, Baskar

    2017-01-01

    Phenotyping is a critical component of plant research. Accurate and precise trait collection, when integrated with genetic tools, can greatly accelerate the rate of genetic gain in crop improvement. However, efficient and automatic phenotyping of traits across large populations is a challenge; which is further exacerbated by the necessity of sampling multiple environments and growing replicated trials. A promising approach is to leverage current advances in imaging technology, data analytics and machine learning to enable automated and fast phenotyping and subsequent decision support. In this context, the workflow for phenotyping (image capture → data storage and curation → trait extraction → machine learning/classification → models/apps for decision support) has to be carefully designed and efficiently executed to minimize resource usage and maximize utility. We illustrate such an end-to-end phenotyping workflow for the case of plant stress severity phenotyping in soybean, with a specific focus on the rapid and automatic assessment of iron deficiency chlorosis (IDC) severity on thousands of field plots. We showcase this analytics framework by extracting IDC features from a set of ~4500 unique canopies representing a diverse germplasm base that have different levels of IDC, and subsequently training a variety of classification models to predict plant stress severity. The best classifier is then deployed as a smartphone app for rapid and real time severity rating in the field. We investigated 10 different classification approaches, with the best classifier being a hierarchical classifier with a mean per-class accuracy of ~96%. We construct a phenotypically meaningful 'population canopy graph', connecting the automatically extracted canopy trait features with plant stress severity rating. We incorporated this image capture → image processing → classification workflow into a smartphone app that enables automated real-time evaluation of IDC scores using digital images of the canopy. We expect this high-throughput framework to help increase the rate of genetic gain by providing a robust extendable framework for other abiotic and biotic stresses. We further envision this workflow embedded onto a high throughput phenotyping ground vehicle and unmanned aerial system that will allow real-time, automated stress trait detection and quantification for plant research, breeding and stress scouting applications.

  14. BioImageXD: an open, general-purpose and high-throughput image-processing platform.

    PubMed

    Kankaanpää, Pasi; Paavolainen, Lassi; Tiitta, Silja; Karjalainen, Mikko; Päivärinne, Joacim; Nieminen, Jonna; Marjomäki, Varpu; Heino, Jyrki; White, Daniel J

    2012-06-28

    BioImageXD puts open-source computer science tools for three-dimensional visualization and analysis into the hands of all researchers, through a user-friendly graphical interface tuned to the needs of biologists. BioImageXD has no restrictive licenses or undisclosed algorithms and enables publication of precise, reproducible and modifiable workflows. It allows simple construction of processing pipelines and should enable biologists to perform challenging analyses of complex processes. We demonstrate its performance in a study of integrin clustering in response to selected inhibitors.

  15. A fluorescence high throughput screening method for the detection of reactive electrophiles as potential skin sensitizers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Avonto, Cristina; Chittiboyina, Amar G.; Rua, Diego

    2015-12-01

    Skin sensitization is an important toxicological end-point in the risk assessment of chemical allergens. Because of the complexity of the biological mechanisms associated with skin sensitization, integrated approaches combining different chemical, biological and in silico methods are recommended to replace conventional animal tests. Chemical methods are intended to characterize the potential of a sensitizer to induce earlier molecular initiating events. The presence of an electrophilic mechanistic domain is considered one of the essential chemical features to covalently bind to the biological target and induce further haptenation processes. Current in chemico assays rely on the quantification of unreacted model nucleophiles aftermore » incubation with the candidate sensitizer. In the current study, a new fluorescence-based method, ‘HTS-DCYA assay’, is proposed. The assay aims at the identification of reactive electrophiles based on their chemical reactivity toward a model fluorescent thiol. The reaction workflow enabled the development of a High Throughput Screening (HTS) method to directly quantify the reaction adducts. The reaction conditions have been optimized to minimize solubility issues, oxidative side reactions and increase the throughput of the assay while minimizing the reaction time, which are common issues with existing methods. Thirty-six chemicals previously classified with LLNA, DPRA or KeratinoSens™ were tested as a proof of concept. Preliminary results gave an estimated 82% accuracy, 78% sensitivity, 90% specificity, comparable to other in chemico methods such as Cys-DPRA. In addition to validated chemicals, six natural products were analyzed and a prediction of their sensitization potential is presented for the first time. - Highlights: • A novel fluorescence-based method to detect electrophilic sensitizers is proposed. • A model fluorescent thiol was used to directly quantify the reaction products. • A discussion of the reaction workflow and critical parameters is presented. • The method could provide a useful tool to complement existing chemical assays.« less

  16. Fully Automated Sample Preparation for Ultrafast N-Glycosylation Analysis of Antibody Therapeutics.

    PubMed

    Szigeti, Marton; Lew, Clarence; Roby, Keith; Guttman, Andras

    2016-04-01

    There is a growing demand in the biopharmaceutical industry for high-throughput, large-scale N-glycosylation profiling of therapeutic antibodies in all phases of product development, but especially during clone selection when hundreds of samples should be analyzed in a short period of time to assure their glycosylation-based biological activity. Our group has recently developed a magnetic bead-based protocol for N-glycosylation analysis of glycoproteins to alleviate the hard-to-automate centrifugation and vacuum-centrifugation steps of the currently used protocols. Glycan release, fluorophore labeling, and cleanup were all optimized, resulting in a <4 h magnetic bead-based process with excellent yield and good repeatability. This article demonstrates the next level of this work by automating all steps of the optimized magnetic bead-based protocol from endoglycosidase digestion, through fluorophore labeling and cleanup with high-throughput sample processing in 96-well plate format, using an automated laboratory workstation. Capillary electrophoresis analysis of the fluorophore-labeled glycans was also optimized for rapid (<3 min) separation to accommodate the high-throughput processing of the automated sample preparation workflow. Ultrafast N-glycosylation analyses of several commercially relevant antibody therapeutics are also shown and compared to their biosimilar counterparts, addressing the biological significance of the differences. © 2015 Society for Laboratory Automation and Screening.

  17. ChemHTPS - A virtual high-throughput screening program suite for the chemical and materials sciences

    NASA Astrophysics Data System (ADS)

    Afzal, Mohammad Atif Faiz; Evangelista, William; Hachmann, Johannes

    The discovery of new compounds, materials, and chemical reactions with exceptional properties is the key for the grand challenges in innovation, energy and sustainability. This process can be dramatically accelerated by means of the virtual high-throughput screening (HTPS) of large-scale candidate libraries. The resulting data can further be used to study the underlying structure-property relationships and thus facilitate rational design capability. This approach has been extensively used for many years in the drug discovery community. However, the lack of openly available virtual HTPS tools is limiting the use of these techniques in various other applications such as photovoltaics, optoelectronics, and catalysis. Thus, we developed ChemHTPS, a general-purpose, comprehensive and user-friendly suite, that will allow users to efficiently perform large in silico modeling studies and high-throughput analyses in these applications. ChemHTPS also includes a massively parallel molecular library generator which offers a multitude of options to customize and restrict the scope of the enumerated chemical space and thus tailor it for the demands of specific applications. To streamline the non-combinatorial exploration of chemical space, we incorporate genetic algorithms into the framework. In addition to implementing smarter algorithms, we also focus on the ease of use, workflow, and code integration to make this technology more accessible to the community.

  18. The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.

    PubMed

    Perreault, Andrea A; Venters, Bryan J

    2016-12-23

    Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.

  19. HTSeq--a Python framework to work with high-throughput sequencing data.

    PubMed

    Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang

    2015-01-15

    A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.

  20. Xi-cam: Flexible High Throughput Data Processing for GISAXS

    NASA Astrophysics Data System (ADS)

    Pandolfi, Ronald; Kumar, Dinesh; Venkatakrishnan, Singanallur; Sarje, Abinav; Krishnan, Hari; Pellouchoud, Lenson; Ren, Fang; Fournier, Amanda; Jiang, Zhang; Tassone, Christopher; Mehta, Apurva; Sethian, James; Hexemer, Alexander

    With increasing capabilities and data demand for GISAXS beamlines, supporting software is under development to handle larger data rates, volumes, and processing needs. We aim to provide a flexible and extensible approach to GISAXS data treatment as a solution to these rising needs. Xi-cam is the CAMERA platform for data management, analysis, and visualization. The core of Xi-cam is an extensible plugin-based GUI platform which provides users an interactive interface to processing algorithms. Plugins are available for SAXS/GISAXS data and data series visualization, as well as forward modeling and simulation through HipGISAXS. With Xi-cam's advanced mode, data processing steps are designed as a graph-based workflow, which can be executed locally or remotely. Remote execution utilizes HPC or de-localized resources, allowing for effective reduction of high-throughput data. Xi-cam is open-source and cross-platform. The processing algorithms in Xi-cam include parallel cpu and gpu processing optimizations, also taking advantage of external processing packages such as pyFAI. Xi-cam is available for download online.

  1. Scalable and High-Throughput Execution of Clinical Quality Measures from Electronic Health Records using MapReduce and the JBoss® Drools Engine

    PubMed Central

    Peterson, Kevin J.; Pathak, Jyotishman

    2014-01-01

    Automated execution of electronic Clinical Quality Measures (eCQMs) from electronic health records (EHRs) on large patient populations remains a significant challenge, and the testability, interoperability, and scalability of measure execution are critical. The High Throughput Phenotyping (HTP; http://phenotypeportal.org) project aligns with these goals by using the standards-based HL7 Health Quality Measures Format (HQMF) and Quality Data Model (QDM) for measure specification, as well as Common Terminology Services 2 (CTS2) for semantic interpretation. The HQMF/QDM representation is automatically transformed into a JBoss® Drools workflow, enabling horizontal scalability via clustering and MapReduce algorithms. Using Project Cypress, automated verification metrics can then be produced. Our results show linear scalability for nine executed 2014 Center for Medicare and Medicaid Services (CMS) eCQMs for eligible professionals and hospitals for >1,000,000 patients, and verified execution correctness of 96.4% based on Project Cypress test data of 58 eCQMs. PMID:25954459

  2. The ESFRI Instruct Core Centre Frankfurt: automated high-throughput crystallization suited for membrane proteins and more.

    PubMed

    Thielmann, Yvonne; Koepke, Juergen; Michel, Hartmut

    2012-06-01

    Structure determination of membrane proteins and membrane protein complexes is still a very challenging field. To facilitate the work on membrane proteins the Core Centre follows a strategy that comprises four labs of protein analytics and crystal handling, covering mass spectrometry, calorimetry, crystallization and X-ray diffraction. This general workflow is presented and a capacity of 20% of the operating time of all systems is provided to the European structural biology community within the ESFRI Instruct program. A description of the crystallization service offered at the Core Centre is given with detailed information on screening strategy, screens used and changes to adapt high throughput for membrane proteins. Our aim is to constantly develop the Core Centre towards the usage of more efficient methods. This strategy might also include the ability to automate all steps from crystallization trials to crystal screening; here we look ahead how this aim might be realized at the Core Centre.

  3. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

    PubMed

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.

  4. GiA Roots: software for the high throughput analysis of plant root system architecture.

    PubMed

    Galkovskyi, Taras; Mileyko, Yuriy; Bucksch, Alexander; Moore, Brad; Symonova, Olga; Price, Charles A; Topp, Christopher N; Iyer-Pascuzzi, Anjali S; Zurek, Paul R; Fang, Suqin; Harer, John; Benfey, Philip N; Weitz, Joshua S

    2012-07-26

    Characterizing root system architecture (RSA) is essential to understanding the development and function of vascular plants. Identifying RSA-associated genes also represents an underexplored opportunity for crop improvement. Software tools are needed to accelerate the pace at which quantitative traits of RSA are estimated from images of root networks. We have developed GiA Roots (General Image Analysis of Roots), a semi-automated software tool designed specifically for the high-throughput analysis of root system images. GiA Roots includes user-assisted algorithms to distinguish root from background and a fully automated pipeline that extracts dozens of root system phenotypes. Quantitative information on each phenotype, along with intermediate steps for full reproducibility, is returned to the end-user for downstream analysis. GiA Roots has a GUI front end and a command-line interface for interweaving the software into large-scale workflows. GiA Roots can also be extended to estimate novel phenotypes specified by the end-user. We demonstrate the use of GiA Roots on a set of 2393 images of rice roots representing 12 genotypes from the species Oryza sativa. We validate trait measurements against prior analyses of this image set that demonstrated that RSA traits are likely heritable and associated with genotypic differences. Moreover, we demonstrate that GiA Roots is extensible and an end-user can add functionality so that GiA Roots can estimate novel RSA traits. In summary, we show that the software can function as an efficient tool as part of a workflow to move from large numbers of root images to downstream analysis.

  5. HiCTMap: Detection and analysis of chromosome territory structure and position by high-throughput imaging.

    PubMed

    Jowhar, Ziad; Gudla, Prabhakar R; Shachar, Sigal; Wangsa, Darawalee; Russ, Jill L; Pegoraro, Gianluca; Ried, Thomas; Raznahan, Armin; Misteli, Tom

    2018-06-01

    The spatial organization of chromosomes in the nuclear space is an extensively studied field that relies on measurements of structural features and 3D positions of chromosomes with high precision and robustness. However, no tools are currently available to image and analyze chromosome territories in a high-throughput format. Here, we have developed High-throughput Chromosome Territory Mapping (HiCTMap), a method for the robust and rapid analysis of 2D and 3D chromosome territory positioning in mammalian cells. HiCTMap is a high-throughput imaging-based chromosome detection method which enables routine analysis of chromosome structure and nuclear position. Using an optimized FISH staining protocol in a 384-well plate format in conjunction with a bespoke automated image analysis workflow, HiCTMap faithfully detects chromosome territories and their position in 2D and 3D in a large population of cells per experimental condition. We apply this novel technique to visualize chromosomes 18, X, and Y in male and female primary human skin fibroblasts, and show accurate detection of the correct number of chromosomes in the respective genotypes. Given the ability to visualize and quantitatively analyze large numbers of nuclei, we use HiCTMap to measure chromosome territory area and volume with high precision and determine the radial position of chromosome territories using either centroid or equidistant-shell analysis. The HiCTMap protocol is also compatible with RNA FISH as demonstrated by simultaneous labeling of X chromosomes and Xist RNA in female cells. We suggest HiCTMap will be a useful tool for routine precision mapping of chromosome territories in a wide range of cell types and tissues. Published by Elsevier Inc.

  6. Three-dimensional Imaging and Scanning: Current and Future Applications for Pathology

    PubMed Central

    Farahani, Navid; Braun, Alex; Jutt, Dylan; Huffman, Todd; Reder, Nick; Liu, Zheng; Yagi, Yukako; Pantanowitz, Liron

    2017-01-01

    Imaging is vital for the assessment of physiologic and phenotypic details. In the past, biomedical imaging was heavily reliant on analog, low-throughput methods, which would produce two-dimensional images. However, newer, digital, and high-throughput three-dimensional (3D) imaging methods, which rely on computer vision and computer graphics, are transforming the way biomedical professionals practice. 3D imaging has been useful in diagnostic, prognostic, and therapeutic decision-making for the medical and biomedical professions. Herein, we summarize current imaging methods that enable optimal 3D histopathologic reconstruction: Scanning, 3D scanning, and whole slide imaging. Briefly mentioned are emerging platforms, which combine robotics, sectioning, and imaging in their pursuit to digitize and automate the entire microscopy workflow. Finally, both current and emerging 3D imaging methods are discussed in relation to current and future applications within the context of pathology. PMID:28966836

  7. Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daum, Christopher; Zane, Matthew; Han, James

    2011-01-31

    The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less

  8. Using the iPlant collaborative discovery environment.

    PubMed

    Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J

    2013-06-01

    The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.

  9. Characterizing Phage Genomes for Therapeutic Applications

    PubMed Central

    Philipson, Casandra W.; Voegtly, Logan J.; Lueder, Matthew R.; Long, Kyle A.; Rice, Gregory K.; Frey, Kenneth G.; Biswas, Biswajit; Cer, Regina Z.; Hamilton, Theron; Bishop-Lilly, Kimberly A.

    2018-01-01

    Multi-drug resistance is increasing at alarming rates. The efficacy of phage therapy, treating bacterial infections with bacteriophages alone or in combination with traditional antibiotics, has been demonstrated in emergency cases in the United States and in other countries, however remains to be approved for wide-spread use in the US. One limiting factor is a lack of guidelines for assessing the genomic safety of phage candidates. We present the phage characterization workflow used by our team to generate data for submitting phages to the Federal Drug Administration (FDA) for authorized use. Essential analysis checkpoints and warnings are detailed for obtaining high-quality genomes, excluding undesirable candidates, rigorously assessing a phage genome for safety and evaluating sequencing contamination. This workflow has been developed in accordance with community standards for high-throughput sequencing of viral genomes as well as principles for ideal phages used for therapy. The feasibility and utility of the pipeline is demonstrated on two new phage genomes that meet all safety criteria. We propose these guidelines as a minimum standard for phages being submitted to the FDA for review as investigational new drug candidates. PMID:29642590

  10. Text mining for the biocuration workflow

    PubMed Central

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  11. FluxCTTX: A LIMS-based tool for management and analysis of cytotoxicity assays data

    PubMed Central

    2015-01-01

    Background Cytotoxicity assays have been used by researchers to screen for cytotoxicity in compound libraries. Researchers can either look for cytotoxic compounds or screen "hits" from initial high-throughput drug screens for unwanted cytotoxic effects before investing in their development as a pharmaceutical. These assays may be used as an alternative to animal experimentation and are becoming increasingly important in modern laboratories. However, the execution of these assays in large scale and different laboratories requires, among other things, the management of protocols, reagents, cell lines used as well as the data produced, which can be a challenge. The management of all this information is greatly improved by the utilization of computational tools to save time and guarantee quality. However, a tool that performs this task designed specifically for cytotoxicity assays is not yet available. Results In this work, we have used a workflow based LIMS -- the Flux system -- and the Together Workflow Editor as a framework to develop FluxCTTX, a tool for management of data from cytotoxicity assays performed at different laboratories. The main work is the development of a workflow, which represents all stages of the assay and has been developed and uploaded in Flux. This workflow models the activities of cytotoxicity assays performed as described in the OECD 129 Guidance Document. Conclusions FluxCTTX presents a solution for the management of the data produced by cytotoxicity assays performed at Interlaboratory comparisons. Its adoption will contribute to guarantee the quality of activities in the process of cytotoxicity tests and enforce the use of Good Laboratory Practices (GLP). Furthermore, the workflow developed is complete and can be adapted to other contexts and different tests for management of other types of data. PMID:26696462

  12. High-throughput image analysis of tumor spheroids: a user-friendly software application to measure the size of spheroids automatically and accurately.

    PubMed

    Chen, Wenjin; Wong, Chung; Vosburgh, Evan; Levine, Arnold J; Foran, David J; Xu, Eugenia Y

    2014-07-08

    The increasing number of applications of three-dimensional (3D) tumor spheroids as an in vitro model for drug discovery requires their adaptation to large-scale screening formats in every step of a drug screen, including large-scale image analysis. Currently there is no ready-to-use and free image analysis software to meet this large-scale format. Most existing methods involve manually drawing the length and width of the imaged 3D spheroids, which is a tedious and time-consuming process. This study presents a high-throughput image analysis software application - SpheroidSizer, which measures the major and minor axial length of the imaged 3D tumor spheroids automatically and accurately; calculates the volume of each individual 3D tumor spheroid; then outputs the results in two different forms in spreadsheets for easy manipulations in the subsequent data analysis. The main advantage of this software is its powerful image analysis application that is adapted for large numbers of images. It provides high-throughput computation and quality-control workflow. The estimated time to process 1,000 images is about 15 min on a minimally configured laptop, or around 1 min on a multi-core performance workstation. The graphical user interface (GUI) is also designed for easy quality control, and users can manually override the computer results. The key method used in this software is adapted from the active contour algorithm, also known as Snakes, which is especially suitable for images with uneven illumination and noisy background that often plagues automated imaging processing in high-throughput screens. The complimentary "Manual Initialize" and "Hand Draw" tools provide the flexibility to SpheroidSizer in dealing with various types of spheroids and diverse quality images. This high-throughput image analysis software remarkably reduces labor and speeds up the analysis process. Implementing this software is beneficial for 3D tumor spheroids to become a routine in vitro model for drug screens in industry and academia.

  13. Focus: a robust workflow for one-dimensional NMR spectral analysis.

    PubMed

    Alonso, Arnald; Rodríguez, Miguel A; Vinaixa, Maria; Tortosa, Raül; Correig, Xavier; Julià, Antonio; Marsal, Sara

    2014-01-21

    One-dimensional (1)H NMR represents one of the most commonly used analytical techniques in metabolomic studies. The increase in the number of samples analyzed as well as the technical improvements involving instrumentation and spectral acquisition demand increasingly accurate and efficient high-throughput data processing workflows. We present FOCUS, an integrated and innovative methodology that provides a complete data analysis workflow for one-dimensional NMR-based metabolomics. This tool will allow users to easily obtain a NMR peak feature matrix ready for chemometric analysis as well as metabolite identification scores for each peak that greatly simplify the biological interpretation of the results. The algorithm development has been focused on solving the critical difficulties that appear at each data processing step and that can dramatically affect the quality of the results. As well as method integration, simplicity has been one of the main objectives in FOCUS development, requiring very little user input to perform accurate peak alignment, peak picking, and metabolite identification. The new spectral alignment algorithm, RUNAS, allows peak alignment with no need of a reference spectrum, and therefore, it reduces the bias introduced by other alignment approaches. Spectral alignment has been tested against previous methodologies obtaining substantial improvements in the case of moderate or highly unaligned spectra. Metabolite identification has also been significantly improved, using the positional and correlation peak patterns in contrast to a reference metabolite panel. Furthermore, the complete workflow has been tested using NMR data sets from 60 human urine samples and 120 aqueous liver extracts, reaching a successful identification of 42 metabolites from the two data sets. The open-source software implementation of this methodology is available at http://www.urr.cat/FOCUS.

  14. Adaptation to high throughput batch chromatography enhances multivariate screening.

    PubMed

    Barker, Gregory A; Calzada, Joseph; Herzer, Sibylle; Rieble, Siegfried

    2015-09-01

    High throughput process development offers unique approaches to explore complex process design spaces with relatively low material consumption. Batch chromatography is one technique that can be used to screen chromatographic conditions in a 96-well plate. Typical batch chromatography workflows examine variations in buffer conditions or comparison of multiple resins in a given process, as opposed to the assessment of protein loading conditions in combination with other factors. A modification to the batch chromatography paradigm is described here where experimental planning, programming, and a staggered loading approach increase the multivariate space that can be explored with a liquid handling system. The iterative batch chromatography (IBC) approach is described, which treats every well in a 96-well plate as an individual experiment, wherein protein loading conditions can be varied alongside other factors such as wash and elution buffer conditions. As all of these factors are explored in the same experiment, the interactions between them are characterized and the number of follow-up confirmatory experiments is reduced. This in turn improves statistical power and throughput. Two examples of the IBC method are shown and the impact of the load conditions are assessed in combination with the other factors explored. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Discovery of an α-Amino C–H Arylation Reaction Using the Strategy of Accelerated Serendipity

    PubMed Central

    McNally, Andrew; Prier, Christopher K.; MacMillan, David W. C.

    2012-01-01

    Serendipity has long been a welcome yet elusive phenomenon in the advancement of chemistry. We sought to exploit serendipity as a means of rapidly identifying unanticipated chemical transformations. By using a high-throughput, automated workflow and evaluating a large number of random reactions, we have discovered a photoredox-catalyzed C–H arylation reaction for the construction of benzylic amines, an important structural motif within pharmaceutical compounds that is not readily accessed via simple substrates. The mechanism directly couples tertiary amines with cyanoaromatics by using mild and operationally trivial conditions. PMID:22116882

  16. Performance Studies on Distributed Virtual Screening

    PubMed Central

    Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.

    2014-01-01

    Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219

  17. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

    PubMed Central

    Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

    2016-01-01

    High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062

  18. Anima: Modular Workflow System for Comprehensive Image Data Analysis

    PubMed Central

    Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa

    2014-01-01

    Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541

  19. Reproducible, high-throughput synthesis of colloidal nanocrystals for optimization in multidimensional parameter space.

    PubMed

    Chan, Emory M; Xu, Chenxu; Mao, Alvin W; Han, Gang; Owen, Jonathan S; Cohen, Bruce E; Milliron, Delia J

    2010-05-12

    While colloidal nanocrystals hold tremendous potential for both enhancing fundamental understanding of materials scaling and enabling advanced technologies, progress in both realms can be inhibited by the limited reproducibility of traditional synthetic methods and by the difficulty of optimizing syntheses over a large number of synthetic parameters. Here, we describe an automated platform for the reproducible synthesis of colloidal nanocrystals and for the high-throughput optimization of physical properties relevant to emerging applications of nanomaterials. This robotic platform enables precise control over reaction conditions while performing workflows analogous to those of traditional flask syntheses. We demonstrate control over the size, size distribution, kinetics, and concentration of reactions by synthesizing CdSe nanocrystals with 0.2% coefficient of variation in the mean diameters across an array of batch reactors and over multiple runs. Leveraging this precise control along with high-throughput optical and diffraction characterization, we effectively map multidimensional parameter space to tune the size and polydispersity of CdSe nanocrystals, to maximize the photoluminescence efficiency of CdTe nanocrystals, and to control the crystal phase and maximize the upconverted luminescence of lanthanide-doped NaYF(4) nanocrystals. On the basis of these demonstrative examples, we conclude that this automated synthesis approach will be of great utility for the development of diverse colloidal nanomaterials for electronic assemblies, luminescent biological labels, electroluminescent devices, and other emerging applications.

  20. A novel spectral library workflow to enhance protein identifications.

    PubMed

    Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei

    2013-04-09

    The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. New hardware and workflows for semi-automated correlative cryo-fluorescence and cryo-electron microscopy/tomography.

    PubMed

    Schorb, Martin; Gaechter, Leander; Avinoam, Ori; Sieckmann, Frank; Clarke, Mairi; Bebeacua, Cecilia; Bykov, Yury S; Sonnen, Andreas F-P; Lihl, Reinhard; Briggs, John A G

    2017-02-01

    Correlative light and electron microscopy allows features of interest defined by fluorescence signals to be located in an electron micrograph of the same sample. Rare dynamic events or specific objects can be identified, targeted and imaged by electron microscopy or tomography. To combine it with structural studies using cryo-electron microscopy or tomography, fluorescence microscopy must be performed while maintaining the specimen vitrified at liquid-nitrogen temperatures and in a dry environment during imaging and transfer. Here we present instrumentation, software and an experimental workflow that improves the ease of use, throughput and performance of correlated cryo-fluorescence and cryo-electron microscopy. The new cryo-stage incorporates a specially modified high-numerical aperture objective lens and provides a stable and clean imaging environment. It is combined with a transfer shuttle for contamination-free loading of the specimen. Optimized microscope control software allows automated acquisition of the entire specimen area by cryo-fluorescence microscopy. The software also facilitates direct transfer of the fluorescence image and associated coordinates to the cryo-electron microscope for subsequent fluorescence-guided automated imaging. Here we describe these technological developments and present a detailed workflow, which we applied for automated cryo-electron microscopy and tomography of various specimens. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  2. NG6: Integrated next generation sequencing storage and processing environment.

    PubMed

    Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

    2012-09-09

    Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.

  3. Modular, Antibody-free Time-Resolved LRET Kinase Assay Enabled by Quantum Dots and Tb3+-sensitizing Peptides

    NASA Astrophysics Data System (ADS)

    Cui, Wei; Parker, Laurie L.

    2016-07-01

    Fluorescent drug screening assays are essential for tyrosine kinase inhibitor discovery. Here we demonstrate a flexible, antibody-free TR-LRET kinase assay strategy that is enabled by the combination of streptavidin-coated quantum dot (QD) acceptors and biotinylated, Tb3+ sensitizing peptide donors. By exploiting the spectral features of Tb3+ and QD, and the high binding affinity of the streptavidin-biotin interaction, we achieved multiplexed detection of kinase activity in a modular fashion without requiring additional covalent labeling of each peptide substrate. This strategy is compatible with high-throughput screening, and should be adaptable to the rapidly changing workflows and targets involved in kinase inhibitor discovery.

  4. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

    PubMed Central

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627

  5. Impact of digital radiography on clinical workflow.

    PubMed

    May, G A; Deer, D D; Dackiewicz, D

    2000-05-01

    It is commonly accepted that digital radiography (DR) improves workflow and patient throughput compared with traditional film radiography or computed radiography (CR). DR eliminates the film development step and the time to acquire the image from a CR reader. In addition, the wide dynamic range of DR is such that the technologist can perform the quality-control (QC) step directly at the modality in a few seconds, rather than having to transport the newly acquired image to a centralized QC station for review. Furthermore, additional workflow efficiencies can be achieved with DR by employing tight radiology information system (RIS) integration. In the DR imaging environment, this provides for patient demographic information to be automatically downloaded from the RIS to populate the DR Digital Imaging and Communications in Medicine (DICOM) image header. To learn more about this workflow efficiency improvement, we performed a comparative study of workflow steps under three different conditions: traditional film/screen x-ray, DR without RIS integration (ie, manual entry of patient demographics), and DR with RIS integration. This study was performed at the Cleveland Clinic Foundation (Cleveland, OH) using a newly acquired amorphous silicon flat-panel DR system from Canon Medical Systems (Irvine, CA). Our data show that DR without RIS results in substantial workflow savings over traditional film/screen practice. There is an additional 30% reduction in total examination time using DR with RIS integration.

  6. Defect Genome of Cubic Perovskites for Fuel Cell Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.

    Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less

  7. Defect Genome of Cubic Perovskites for Fuel Cell Applications

    DOE PAGES

    Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.; ...

    2017-10-10

    Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less

  8. Absolute quantification of prion protein (90-231) using stable isotope-labeled chymotryptic peptide standards in a LC-MRM AQUA workflow.

    PubMed

    Sturm, Robert; Sheynkman, Gloria; Booth, Clarissa; Smith, Lloyd M; Pedersen, Joel A; Li, Lingjun

    2012-09-01

    Substantial evidence indicates that the disease-associated conformer of the prion protein (PrP(TSE)) constitutes the etiologic agent in prion diseases. These diseases affect multiple mammalian species. PrP(TSE) has the ability to convert the conformation of the normal prion protein (PrP(C)) into a β-sheet rich form resistant to proteinase K digestion. Common immunological techniques lack the sensitivity to detect PrP(TSE) at subfemtomole levels, whereas animal bioassays, cell culture, and in vitro conversion assays offer higher sensitivity but lack the high-throughput the immunological assays offer. Mass spectrometry is an attractive alternative to the above assays as it offers high-throughput, direct measurement of a protein's signature peptide, often with subfemtomole sensitivities. Although a liquid chromatography-multiple reaction monitoring (LC-MRM) method has been reported for PrP(TSE), the chemical composition and lack of amino acid sequence conservation of the signature peptide may compromise its accuracy and make it difficult to apply to multiple species. Here, we demonstrate that an alternative protease (chymotrypsin) can produce signature peptides suitable for a LC-MRM absolute quantification (AQUA) experiment. The new method offers several advantages, including: (1) a chymotryptic signature peptide lacking chemically active residues (Cys, Met) that can confound assay accuracy; (2) low attomole limits of detection and quantitation (LOD and LOQ); and (3) a signature peptide retaining the same amino acid sequence across most mammals naturally susceptible to prion infection as well as important laboratory models. To the authors' knowledge, this is the first report on the use of a non-tryptic peptide in a LC-MRM AQUA workflow.

  9. Absolute quantification of prion protein (90-231) using stable isotope-labeled chymotryptic peptide standards in a LC-MRM AQUA workflow

    PubMed Central

    Sturm, Robert; Kreitinger, Gloria; Booth, Clarissa; Smith, Lloyd; Pedersen, Joel; Li, Lingjun

    2012-01-01

    Substantial evidence indicates that the disease-associated conformer of the prion protein (PrPTSE) constitutes the etiological agent in prion diseases. These diseases affect multiple mammalian species. PrPTSE has the ability to convert the conformation of the normal prion protein (PrPC) into a β-sheet rich form resistant to proteinase K digestion. Common immunological techniques lack the sensitivity to detect PrPTSE at sub-femtomole levels while animal bioassays, cell culture, and in vitro conversion assays offer ultrasensitivity but lack the high-throughput the immunological assays offer. Mass spectrometry is an attractive alternative to the above assays as it offers high-throughput, direct measurement of a protein’s signature peptide, often with sub-femtomole sensitivities. Although a liquid chromatography-multiple reaction monitoring (LC-MRM) method has been reported for PrPTSE, the chemical composition and lack of amino acid sequence conservation of the signature peptide may compromise its accuracy and make it difficult to apply to multiple species. Here, we demonstrate that an alternative protease (chymotrypsin) can produce signature peptides suitable for a LC-MRM absolute quantification (AQUA) experiment. The new method offers several advantages, including: (1) a chymotryptic signature peptide lacking chemically active residues (Cys, Met) that can confound assay accuracy; (2) low attomole limits of detection and quantitation (LOD and LOQ); and (3) a signature peptide retaining the same amino acid sequence across most mammals naturally susceptible to prion infection as well as important laboratory models. To the authors’ knowledge, this is the first report of the use of a non-tryptic peptide in a LC-MRM AQUA workflow. PMID:22714949

  10. Absolute Quantification of Prion Protein (90-231) Using Stable Isotope-Labeled Chymotryptic Peptide Standards in a LC-MRM AQUA Workflow

    NASA Astrophysics Data System (ADS)

    Sturm, Robert; Sheynkman, Gloria; Booth, Clarissa; Smith, Lloyd M.; Pedersen, Joel A.; Li, Lingjun

    2012-09-01

    Substantial evidence indicates that the disease-associated conformer of the prion protein (PrPTSE) constitutes the etiologic agent in prion diseases. These diseases affect multiple mammalian species. PrPTSE has the ability to convert the conformation of the normal prion protein (PrPC) into a β-sheet rich form resistant to proteinase K digestion. Common immunological techniques lack the sensitivity to detect PrPTSE at subfemtomole levels, whereas animal bioassays, cell culture, and in vitro conversion assays offer higher sensitivity but lack the high-throughput the immunological assays offer. Mass spectrometry is an attractive alternative to the above assays as it offers high-throughput, direct measurement of a protein's signature peptide, often with subfemtomole sensitivities. Although a liquid chromatography-multiple reaction monitoring (LC-MRM) method has been reported for PrPTSE, the chemical composition and lack of amino acid sequence conservation of the signature peptide may compromise its accuracy and make it difficult to apply to multiple species. Here, we demonstrate that an alternative protease (chymotrypsin) can produce signature peptides suitable for a LC-MRM absolute quantification (AQUA) experiment. The new method offers several advantages, including: (1) a chymotryptic signature peptide lacking chemically active residues (Cys, Met) that can confound assay accuracy; (2) low attomole limits of detection and quantitation (LOD and LOQ); and (3) a signature peptide retaining the same amino acid sequence across most mammals naturally susceptible to prion infection as well as important laboratory models. To the authors' knowledge, this is the first report on the use of a non-tryptic peptide in a LC-MRM AQUA workflow.

  11. Differential Expression and Functional Analysis of High-Throughput -Omics Data Using Open Source Tools.

    PubMed

    Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N

    2017-01-01

    Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.

  12. The essential roles of chemistry in high-throughput screening triage

    PubMed Central

    Dahlin, Jayme L; Walters, Michael A

    2015-01-01

    It is increasingly clear that academic high-throughput screening (HTS) and virtual HTS triage suffers from a lack of scientists trained in the art and science of early drug discovery chemistry. Many recent publications report the discovery of compounds by screening that are most likely artifacts or promiscuous bioactive compounds, and these results are not placed into the context of previous studies. For HTS to be most successful, it is our contention that there must exist an early partnership between biologists and medicinal chemists. Their combined skill sets are necessary to design robust assays and efficient workflows that will weed out assay artifacts, false positives, promiscuous bioactive compounds and intractable screening hits, efforts that ultimately give projects a better chance at identifying truly useful chemical matter. Expertise in medicinal chemistry, cheminformatics and purification sciences (analytical chemistry) can enhance the post-HTS triage process by quickly removing these problematic chemotypes from consideration, while simultaneously prioritizing the more promising chemical matter for follow-up testing. It is only when biologists and chemists collaborate effectively that HTS can manifest its full promise. PMID:25163000

  13. Single-cell genomics for the masses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tringe, Susannah G.

    In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.

  14. Single-cell genomics for the masses

    DOE PAGES

    Tringe, Susannah G.

    2017-07-12

    In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.

  15. COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA

    PubMed Central

    Wenger, Craig D.; Phanstiel, Douglas H.; Lee, M. Violet; Bailey, Derek J.; Coon, Joshua J.

    2011-01-01

    Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline. PMID:21298793

  16. Poor quality drugs: grand challenges in high throughput detection, countrywide sampling, and forensics in developing countries†

    PubMed Central

    Fernandez, Facundo M.; Hostetler, Dana; Powell, Kristen; Kaur, Harparkash; Green, Michael D.; Mildenhall, Dallas C.; Newton, Paul N.

    2012-01-01

    Throughout history, poor quality medicines have been a persistent problem, with periodical crises in the supply of antimicrobials, such as fake cinchona bark in the 1600s and fake quinine in the 1800s. Regrettably, this problem seems to have grown in the last decade, especially afflicting unsuspecting patients and those seeking medicines via on-line pharmacies. Here we discuss some of the challenges related to the fight against poor quality drugs, and counterfeits in particular, with an emphasis on the analytical tools available, their relative performance, and the necessary workflows needed for distinguishing between genuine, substandard, degraded and counterfeit medicines. PMID:21107455

  17. Simulation Modeling to Compare High-Throughput, Low-Iteration Optimization Strategies for Metabolic Engineering

    PubMed Central

    Heinsch, Stephen C.; Das, Siba R.; Smanski, Michael J.

    2018-01-01

    Increasing the final titer of a multi-gene metabolic pathway can be viewed as a multivariate optimization problem. While numerous multivariate optimization algorithms exist, few are specifically designed to accommodate the constraints posed by genetic engineering workflows. We present a strategy for optimizing expression levels across an arbitrary number of genes that requires few design-build-test iterations. We compare the performance of several optimization algorithms on a series of simulated expression landscapes. We show that optimal experimental design parameters depend on the degree of landscape ruggedness. This work provides a theoretical framework for designing and executing numerical optimization on multi-gene systems. PMID:29535690

  18. An open-source computational and data resource to analyze digital maps of immunopeptidomes

    DOE PAGES

    Caron, Etienne; Espona, Lucia; Kowalewski, Daniel J.; ...

    2015-07-08

    We present a novel mass spectrometry-based high-throughput workflow and an open-source computational and data resource to reproducibly identify and quantify HLA-associated peptides. Collectively, the resources support the generation of HLA allele-specific peptide assay libraries consisting of consensus fragment ion spectra, and the analysis of quantitative digital maps of HLA peptidomes generated from a range of biological sources by SWATH mass spectrometry (MS). This study represents the first community-based effort to develop a robust platform for the reproducible and quantitative measurement of the entire repertoire of peptides presented by HLA molecules, an essential step towards the design of efficient immunotherapies.

  19. High-throughput and multiplexed regeneration buffer scouting for affinity-based interactions.

    PubMed

    Geuijen, Karin P M; Schasfoort, Richard B; Wijffels, Rene H; Eppink, Michel H M

    2014-06-01

    Affinity-based analyses on biosensors depend partly on regeneration between measurements. Regeneration is performed with a buffer that efficiently breaks all interactions between ligand and analyte while maintaining the active binding site of the ligand. We demonstrated a regeneration buffer scouting using the combination of a continuous flow microspotter with a surface plasmon resonance imaging platform to simultaneously test 48 different regeneration buffers on a single biosensor. Optimal regeneration conditions are found within hours and consume little amounts of buffers, analyte, and ligand. This workflow can be applied to any ligand that is coupled through amine, thiol, or streptavidin immobilization. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. File formats commonly used in mass spectrometry proteomics.

    PubMed

    Deutsch, Eric W

    2012-12-01

    The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics.

  1. High-throughput SISCAPA quantitation of peptides from human plasma digests by ultrafast, liquid chromatography-free mass spectrometry.

    PubMed

    Razavi, Morteza; Frick, Lauren E; LaMarr, William A; Pope, Matthew E; Miller, Christine A; Anderson, N Leigh; Pearson, Terry W

    2012-12-07

    We investigated the utility of an SPE-MS/MS platform in combination with a modified SISCAPA workflow for chromatography-free MRM analysis of proteotypic peptides in digested human plasma. This combination of SISCAPA and SPE-MS/MS technology allows sensitive, MRM-based quantification of peptides from plasma digests with a sample cycle time of ∼7 s, a 300-fold improvement over typical MRM analyses with analysis times of 30-40 min that use liquid chromatography upstream of MS. The optimized system includes capture and enrichment to near purity of target proteotypic peptides using rigorously selected, high affinity, antipeptide monoclonal antibodies and reduction of background peptides using a novel treatment of magnetic bead immunoadsorbents. Using this method, we have successfully quantitated LPS-binding protein and mesothelin (concentrations of ∼5000 ng/mL and ∼10 ng/mL, respectively) in human plasma. The method eliminates the need for upstream liquid-chromatography and can be multiplexed, thus facilitating quantitative analysis of proteins, including biomarkers, in large sample sets. The method is ideal for high-throughput biomarker validation after affinity enrichment and has the potential for applications in clinical laboratories.

  2. Process-driven information management system at a biotech company: concept and implementation.

    PubMed

    Gobbi, Alberto; Funeriu, Sandra; Ioannou, John; Wang, Jinyi; Lee, Man-Ling; Palmer, Chris; Bamford, Bob; Hewitt, Robin

    2004-01-01

    While established pharmaceutical companies have chemical information systems in place to manage their compounds and the associated data, new startup companies need to implement these systems from scratch. Decisions made early in the design phase usually have long lasting effects on the expandability, maintenance effort, and costs associated with the information management system. Careful analysis of work and data flows, both inter- and intradepartmental, and identification of existing dependencies between activities are important. This knowledge is required to implement an information management system, which enables the research community to work efficiently by avoiding redundant registration and processing of data and by timely provision of the data whenever needed. This paper first presents the workflows existing at Anadys, then ARISE, the research information management system developed in-house at Anadys. ARISE was designed to support the preclinical drug discovery process and covers compound registration, analytical quality control, inventory management, high-throughput screening, lower throughput screening, and data reporting.

  3. High-throughput automated microfluidic sample preparation for accurate microbial genomics

    PubMed Central

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B.; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P.; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C.

    2017-01-01

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications. PMID:28128213

  4. The emerging process of Top Down mass spectrometry for protein analysis: biomarkers, protein-therapeutics, and achieving high throughput†

    PubMed Central

    Kellie, John F.; Tran, John C.; Lee, Ji Eun; Ahlf, Dorothy R.; Thomas, Haylee M.; Ntai, Ioanna; Catherman, Adam D.; Durbin, Kenneth R.; Zamdborg, Leonid; Vellaichamy, Adaikkalam; Thomas, Paul M.

    2011-01-01

    Top Down mass spectrometry (MS) has emerged as an alternative to common Bottom Up strategies for protein analysis. In the Top Down approach, intact proteins are fragmented directly in the mass spectrometer to achieve both protein identification and characterization, even capturing information on combinatorial post-translational modifications. Just in the past two years, Top Down MS has seen incremental advances in instrumentation and dedicated software, and has also experienced a major boost from refined separations of whole proteins in complex mixtures that have both high recovery and reproducibility. Combined with steadily advancing commercial MS instrumentation and data processing, a high-throughput workflow covering intact proteins and polypeptides up to 70 kDa is directly visible in the near future. PMID:20711533

  5. Applications of pathology-assisted image analysis of immunohistochemistry-based biomarkers in oncology.

    PubMed

    Shinde, V; Burke, K E; Chakravarty, A; Fleming, M; McDonald, A A; Berger, A; Ecsedy, J; Blakemore, S J; Tirrell, S M; Bowman, D

    2014-01-01

    Immunohistochemistry-based biomarkers are commonly used to understand target inhibition in key cancer pathways in preclinical models and clinical studies. Automated slide-scanning and advanced high-throughput image analysis software technologies have evolved into a routine methodology for quantitative analysis of immunohistochemistry-based biomarkers. Alongside the traditional pathology H-score based on physical slides, the pathology world is welcoming digital pathology and advanced quantitative image analysis, which have enabled tissue- and cellular-level analysis. An automated workflow was implemented that includes automated staining, slide-scanning, and image analysis methodologies to explore biomarkers involved in 2 cancer targets: Aurora A and NEDD8-activating enzyme (NAE). The 2 workflows highlight the evolution of our immunohistochemistry laboratory and the different needs and requirements of each biological assay. Skin biopsies obtained from MLN8237 (Aurora A inhibitor) phase 1 clinical trials were evaluated for mitotic and apoptotic index, while mitotic index and defects in chromosome alignment and spindles were assessed in tumor biopsies to demonstrate Aurora A inhibition. Additionally, in both preclinical xenograft models and an acute myeloid leukemia phase 1 trial of the NAE inhibitor MLN4924, development of a novel image algorithm enabled measurement of downstream pathway modulation upon NAE inhibition. In the highlighted studies, developing a biomarker strategy based on automated image analysis solutions enabled project teams to confirm target and pathway inhibition and understand downstream outcomes of target inhibition with increased throughput and quantitative accuracy. These case studies demonstrate a strategy that combines a pathologist's expertise with automated image analysis to support oncology drug discovery and development programs.

  6. Industrial methodology for process verification in research (IMPROVER): toward systems biology verification

    PubMed Central

    Meyer, Pablo; Hoeng, Julia; Rice, J. Jeremy; Norel, Raquel; Sprengel, Jörg; Stolle, Katrin; Bonk, Thomas; Corthesy, Stephanie; Royyuru, Ajay; Peitsch, Manuel C.; Stolovitzky, Gustavo

    2012-01-01

    Motivation: Analyses and algorithmic predictions based on high-throughput data are essential for the success of systems biology in academic and industrial settings. Organizations, such as companies and academic consortia, conduct large multi-year scientific studies that entail the collection and analysis of thousands of individual experiments, often over many physical sites and with internal and outsourced components. To extract maximum value, the interested parties need to verify the accuracy and reproducibility of data and methods before the initiation of such large multi-year studies. However, systematic and well-established verification procedures do not exist for automated collection and analysis workflows in systems biology which could lead to inaccurate conclusions. Results: We present here, a review of the current state of systems biology verification and a detailed methodology to address its shortcomings. This methodology named ‘Industrial Methodology for Process Verification in Research’ or IMPROVER, consists on evaluating a research program by dividing a workflow into smaller building blocks that are individually verified. The verification of each building block can be done internally by members of the research program or externally by ‘crowd-sourcing’ to an interested community. www.sbvimprover.com Implementation: This methodology could become the preferred choice to verify systems biology research workflows that are becoming increasingly complex and sophisticated in industrial and academic settings. Contact: gustavo@us.ibm.com PMID:22423044

  7. Flexible End2End Workflow Automation of Hit-Discovery Research.

    PubMed

    Holzmüller-Laue, Silke; Göde, Bernd; Thurow, Kerstin

    2014-08-01

    The article considers a new approach of more complex laboratory automation at the workflow layer. The authors purpose the automation of end2end workflows. The combination of all relevant subprocesses-whether automated or manually performed, independently, and in which organizational unit-results in end2end processes that include all result dependencies. The end2end approach focuses on not only the classical experiments in synthesis or screening, but also on auxiliary processes such as the production and storage of chemicals, cell culturing, and maintenance as well as preparatory activities and analyses of experiments. Furthermore, the connection of control flow and data flow in the same process model leads to reducing of effort of the data transfer between the involved systems, including the necessary data transformations. This end2end laboratory automation can be realized effectively with the modern methods of business process management (BPM). This approach is based on a new standardization of the process-modeling notation Business Process Model and Notation 2.0. In drug discovery, several scientific disciplines act together with manifold modern methods, technologies, and a wide range of automated instruments for the discovery and design of target-based drugs. The article discusses the novel BPM-based automation concept with an implemented example of a high-throughput screening of previously synthesized compound libraries. © 2014 Society for Laboratory Automation and Screening.

  8. Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

    PubMed Central

    Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

    2013-01-01

    Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798

  9. Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-Ban; Chua, Xin-Yi; Cong, Yingnan; Hogan, James M; Maetschke, Stefan R; Ragan, Mark A

    2017-06-30

    We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed. © The Author 2017. Published by Oxford University Press.

  10. MODULAR ANALYTICS: A New Approach to Automation in the Clinical Laboratory.

    PubMed

    Horowitz, Gary L; Zaman, Zahur; Blanckaert, Norbert J C; Chan, Daniel W; Dubois, Jeffrey A; Golaz, Olivier; Mensi, Noury; Keller, Franz; Stolz, Herbert; Klingler, Karl; Marocchi, Alessandro; Prencipe, Lorenzo; McLawhon, Ronald W; Nilsen, Olaug L; Oellerich, Michael; Luthe, Hilmar; Orsonneau, Jean-Luc; Richeux, Gérard; Recio, Fernando; Roldan, Esther; Rymo, Lars; Wicktorsson, Anne-Charlotte; Welch, Shirley L; Wieland, Heinrich; Grawitz, Andrea Busse; Mitsumaki, Hiroshi; McGovern, Margaret; Ng, Katherine; Stockmann, Wolfgang

    2005-01-01

    MODULAR ANALYTICS (Roche Diagnostics) (MODULAR ANALYTICS, Elecsys and Cobas Integra are trademarks of a member of the Roche Group) represents a new approach to automation for the clinical chemistry laboratory. It consists of a control unit, a core unit with a bidirectional multitrack rack transportation system, and three distinct kinds of analytical modules: an ISE module, a P800 module (44 photometric tests, throughput of up to 800 tests/h), and a D2400 module (16 photometric tests, throughput up to 2400 tests/h). MODULAR ANALYTICS allows customised configurations for various laboratory workloads. The performance and practicability of MODULAR ANALYTICS were evaluated in an international multicentre study at 16 sites. Studies included precision, accuracy, analytical range, carry-over, and workflow assessment. More than 700 000 results were obtained during the course of the study. Median between-day CVs were typically less than 3% for clinical chemistries and less than 6% for homogeneous immunoassays. Median recoveries for nearly all standardised reference materials were within 5% of assigned values. Method comparisons versus current existing routine instrumentation were clinically acceptable in all cases. During the workflow studies, the work from three to four single workstations was transferred to MODULAR ANALYTICS, which offered over 100 possible methods, with reduction in sample splitting, handling errors, and turnaround time. Typical sample processing time on MODULAR ANALYTICS was less than 30 minutes, an improvement from the current laboratory systems. By combining multiple analytic units in flexible ways, MODULAR ANALYTICS met diverse laboratory needs and offered improvement in workflow over current laboratory situations. It increased overall efficiency while maintaining (or improving) quality.

  11. MODULAR ANALYTICS: A New Approach to Automation in the Clinical Laboratory

    PubMed Central

    Zaman, Zahur; Blanckaert, Norbert J. C.; Chan, Daniel W.; Dubois, Jeffrey A.; Golaz, Olivier; Mensi, Noury; Keller, Franz; Stolz, Herbert; Klingler, Karl; Marocchi, Alessandro; Prencipe, Lorenzo; McLawhon, Ronald W.; Nilsen, Olaug L.; Oellerich, Michael; Luthe, Hilmar; Orsonneau, Jean-Luc; Richeux, Gérard; Recio, Fernando; Roldan, Esther; Rymo, Lars; Wicktorsson, Anne-Charlotte; Welch, Shirley L.; Wieland, Heinrich; Grawitz, Andrea Busse; Mitsumaki, Hiroshi; McGovern, Margaret; Ng, Katherine; Stockmann, Wolfgang

    2005-01-01

    MODULAR ANALYTICS (Roche Diagnostics) (MODULAR ANALYTICS, Elecsys and Cobas Integra are trademarks of a member of the Roche Group) represents a new approach to automation for the clinical chemistry laboratory. It consists of a control unit, a core unit with a bidirectional multitrack rack transportation system, and three distinct kinds of analytical modules: an ISE module, a P800 module (44 photometric tests, throughput of up to 800 tests/h), and a D2400 module (16 photometric tests, throughput up to 2400 tests/h). MODULAR ANALYTICS allows customised configurations for various laboratory workloads. The performance and practicability of MODULAR ANALYTICS were evaluated in an international multicentre study at 16 sites. Studies included precision, accuracy, analytical range, carry-over, and workflow assessment. More than 700 000 results were obtained during the course of the study. Median between-day CVs were typically less than 3% for clinical chemistries and less than 6% for homogeneous immunoassays. Median recoveries for nearly all standardised reference materials were within 5% of assigned values. Method comparisons versus current existing routine instrumentation were clinically acceptable in all cases. During the workflow studies, the work from three to four single workstations was transferred to MODULAR ANALYTICS, which offered over 100 possible methods, with reduction in sample splitting, handling errors, and turnaround time. Typical sample processing time on MODULAR ANALYTICS was less than 30 minutes, an improvement from the current laboratory systems. By combining multiple analytic units in flexible ways, MODULAR ANALYTICS met diverse laboratory needs and offered improvement in workflow over current laboratory situations. It increased overall efficiency while maintaining (or improving) quality. PMID:18924721

  12. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    DOE PAGES

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; ...

    2017-11-23

    Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less

  13. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    NASA Astrophysics Data System (ADS)

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; Tsaris, Aristeidis; Norman, Andrew; Lyon, Adam; Ross, Robert

    2017-10-01

    The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.

  14. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo

    Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less

  15. Quantitative, multiplexed workflow for deep analysis of human blood plasma and biomarker discovery by mass spectrometry.

    PubMed

    Keshishian, Hasmik; Burgess, Michael W; Specht, Harrison; Wallace, Luke; Clauser, Karl R; Gillette, Michael A; Carr, Steven A

    2017-08-01

    Proteomic characterization of blood plasma is of central importance to clinical proteomics and particularly to biomarker discovery studies. The vast dynamic range and high complexity of the plasma proteome have, however, proven to be serious challenges and have often led to unacceptable tradeoffs between depth of coverage and sample throughput. We present an optimized sample-processing pipeline for analysis of the human plasma proteome that provides greatly increased depth of detection, improved quantitative precision and much higher sample analysis throughput as compared with prior methods. The process includes abundant protein depletion, isobaric labeling at the peptide level for multiplexed relative quantification and ultra-high-performance liquid chromatography coupled to accurate-mass, high-resolution tandem mass spectrometry analysis of peptides fractionated off-line by basic pH reversed-phase (bRP) chromatography. The overall reproducibility of the process, including immunoaffinity depletion, is high, with a process replicate coefficient of variation (CV) of <12%. Using isobaric tags for relative and absolute quantitation (iTRAQ) 4-plex, >4,500 proteins are detected and quantified per patient sample on average, with two or more peptides per protein and starting from as little as 200 μl of plasma. The approach can be multiplexed up to 10-plex using tandem mass tags (TMT) reagents, further increasing throughput, albeit with some decrease in the number of proteins quantified. In addition, we provide a rapid protocol for analysis of nonfractionated depleted plasma samples analyzed in 10-plex. This provides ∼600 quantified proteins for each of the ten samples in ∼5 h of instrument time.

  16. A practical data processing workflow for multi-OMICS projects.

    PubMed

    Kohl, Michael; Megger, Dominik A; Trippler, Martin; Meckel, Hagen; Ahrens, Maike; Bracht, Thilo; Weber, Frank; Hoffmann, Andreas-Claudius; Baba, Hideo A; Sitek, Barbara; Schlaak, Jörg F; Meyer, Helmut E; Stephan, Christian; Eisenacher, Martin

    2014-01-01

    Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. PLAStiCC: Predictive Look-Ahead Scheduling for Continuous dataflows on Clouds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumbhare, Alok; Simmhan, Yogesh; Prasanna, Viktor K.

    2014-05-27

    Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous dataflows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the application’s throughput. In this paper wemore » propose PLAStiCC, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based look-ahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from public and private IaaS clouds. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.« less

  18. Workflow efficiency of two 1.5 T MR scanners with and without an automated user interface for head examinations.

    PubMed

    Moenninghoff, Christoph; Umutlu, Lale; Kloeters, Christian; Ringelstein, Adrian; Ladd, Mark E; Sombetzki, Antje; Lauenstein, Thomas C; Forsting, Michael; Schlamann, Marc

    2013-06-01

    Workflow efficiency and workload of radiological technologists (RTs) were compared in head examinations performed with two 1.5 T magnetic resonance (MR) scanners equipped with or without an automated user interface called "day optimizing throughput" (Dot) workflow engine. Thirty-four patients with known intracranial pathology were examined with a 1.5 T MR scanner with Dot workflow engine (Siemens MAGNETOM Aera) and with a 1.5 T MR scanner with conventional user interface (Siemens MAGNETOM Avanto) using four standardized examination protocols. The elapsed time for all necessary work steps, which were performed by 11 RTs within the total examination time, was compared for each examination at both MR scanners. The RTs evaluated the user-friendliness of both scanners by a questionnaire. Normality of distribution was checked for all continuous variables by use of the Shapiro-Wilk test. Normally distributed variables were analyzed by Student's paired t-test, otherwise Wilcoxon signed-rank test was used to compare means. Total examination time of MR examinations performed with Dot engine was reduced from 24:53 to 20:01 minutes (P < .001) and the necessary RT intervention decreased by 61% (P < .001). The Dot engine's automated choice of MR protocols was significantly better assessed by the RTs than the conventional user interface (P = .001). According to this preliminary study, the Dot workflow engine is a time-saving user assistance software, which decreases the RTs' effort significantly and may help to automate neuroradiological examinations for a higher workflow efficiency. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.

  19. File Formats Commonly Used in Mass Spectrometry Proteomics*

    PubMed Central

    Deutsch, Eric W.

    2012-01-01

    The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics. PMID:22956731

  20. Progress on the Fabric for Frontier Experiments Project at Fermilab

    NASA Astrophysics Data System (ADS)

    Box, Dennis; Boyd, Joseph; Dykstra, Dave; Garzoglio, Gabriele; Herner, Kenneth; Kirby, Michael; Kreymer, Arthur; Levshina, Tanya; Mhashilkar, Parag; Sharma, Neha

    2015-12-01

    The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.

  1. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  2. Microfluidic Imaging Flow Cytometry by Asymmetric-detection Time-stretch Optical Microscopy (ATOM).

    PubMed

    Tang, Anson H L; Lai, Queenie T K; Chung, Bob M F; Lee, Kelvin C M; Mok, Aaron T Y; Yip, G K; Shum, Anderson H C; Wong, Kenneth K Y; Tsia, Kevin K

    2017-06-28

    Scaling the number of measurable parameters, which allows for multidimensional data analysis and thus higher-confidence statistical results, has been the main trend in the advanced development of flow cytometry. Notably, adding high-resolution imaging capabilities allows for the complex morphological analysis of cellular/sub-cellular structures. This is not possible with standard flow cytometers. However, it is valuable for advancing our knowledge of cellular functions and can benefit life science research, clinical diagnostics, and environmental monitoring. Incorporating imaging capabilities into flow cytometry compromises the assay throughput, primarily due to the limitations on speed and sensitivity in the camera technologies. To overcome this speed or throughput challenge facing imaging flow cytometry while preserving the image quality, asymmetric-detection time-stretch optical microscopy (ATOM) has been demonstrated to enable high-contrast, single-cell imaging with sub-cellular resolution, at an imaging throughput as high as 100,000 cells/s. Based on the imaging concept of conventional time-stretch imaging, which relies on all-optical image encoding and retrieval through the use of ultrafast broadband laser pulses, ATOM further advances imaging performance by enhancing the image contrast of unlabeled/unstained cells. This is achieved by accessing the phase-gradient information of the cells, which is spectrally encoded into single-shot broadband pulses. Hence, ATOM is particularly advantageous in high-throughput measurements of single-cell morphology and texture - information indicative of cell types, states, and even functions. Ultimately, this could become a powerful imaging flow cytometry platform for the biophysical phenotyping of cells, complementing the current state-of-the-art biochemical-marker-based cellular assay. This work describes a protocol to establish the key modules of an ATOM system (from optical frontend to data processing and visualization backend), as well as the workflow of imaging flow cytometry based on ATOM, using human cells and micro-algae as the examples.

  3. Global connectivity of hub residues in Oncoprotein structures encodes genetic factors dictating personalized drug response to targeted Cancer therapy

    NASA Astrophysics Data System (ADS)

    Soundararajan, Venky; Aravamudan, Murali

    2014-12-01

    The efficacy and mechanisms of therapeutic action are largely described by atomic bonds and interactions local to drug binding sites. Here we introduce global connectivity analysis as a high-throughput computational assay of therapeutic action - inspired by the Google page rank algorithm that unearths most ``globally connected'' websites from the information-dense world wide web (WWW). We execute short timescale (30 ps) molecular dynamics simulations with high sampling frequency (0.01 ps), to identify amino acid residue hubs whose global connectivity dynamics are characteristic of the ligand or mutation associated with the target protein. We find that unexpected allosteric hubs - up to 20Å from the ATP binding site, but within 5Å of the phosphorylation site - encode the Gibbs free energy of inhibition (ΔGinhibition) for select protein kinase-targeted cancer therapeutics. We further find that clinically relevant somatic cancer mutations implicated in both drug resistance and personalized drug sensitivity can be predicted in a high-throughput fashion. Our results establish global connectivity analysis as a potent assay of protein functional modulation. This sets the stage for unearthing disease-causal exome mutations and motivates forecast of clinical drug response on a patient-by-patient basis. We suggest incorporation of structure-guided genetic inference assays into pharmaceutical and healthcare Oncology workflows.

  4. SECIMTools: a suite of metabolomics data analysis tools.

    PubMed

    Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M

    2018-04-20

    Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.

  5. Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

    PubMed

    Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

    2017-06-14

    Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.

  6. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.

    PubMed

    Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G

    2013-07-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.

  7. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows

    PubMed Central

    Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.

    2013-01-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784

  8. Unparalleled sample treatment throughput for proteomics workflows relying on ultrasonic energy.

    PubMed

    Jorge, Susana; Araújo, J E; Pimentel-Santos, F M; Branco, Jaime C; Santos, Hugo M; Lodeiro, Carlos; Capelo, J L

    2018-02-01

    We report on the new microplate horn ultrasonic device as a powerful tool to speed proteomics workflows with unparalleled throughput. 96 complex proteomes were digested at the same time in 4min. Variables such as ultrasonication time, ultrasonication amplitude, and protein to enzyme ratio were optimized. The "classic" method relying on overnight protein digestion (12h) and the sonoreactor-based method were also employed for comparative purposes. We found the protein digestion efficiency homogeneously distributed in the entire microplate horn surface using the following conditions: 4min sonication time and 25% amplitude. Using this approach, patients with lymphoma and myeloma were classified using principal component analysis and a 2D gel-mass spectrometry based approach. Furthermore, we demonstrate the excellent performance by using MALDI-mass spectrometry based profiling as a fast way to classify patients with rheumatoid arthritis, systemic lupus erythematosus, and ankylosing spondylitis. Finally, the speed and simplicity of this method were demonstrated by clustering 90 patients with knee osteoarthritis disease (30), with a prosthesis (30, control group) and healthy individuals (30) with no history of joint disease. Overall, the new approach allows profiling a disease in just one week while allows to match the minimalism rules as outlined by Halls. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

    PubMed Central

    Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

    2014-01-01

    The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082

  10. Development of a Kinetic Assay for Late Endosome Movement.

    PubMed

    Esner, Milan; Meyenhofer, Felix; Kuhn, Michael; Thomas, Melissa; Kalaidzidis, Yannis; Bickle, Marc

    2014-08-01

    Automated imaging screens are performed mostly on fixed and stained samples to simplify the workflow and increase throughput. Some processes, such as the movement of cells and organelles or measuring membrane integrity and potential, can be measured only in living cells. Developing such assays to screen large compound or RNAi collections is challenging in many respects. Here, we develop a live-cell high-content assay for tracking endocytic organelles in medium throughput. We evaluate the added value of measuring kinetic parameters compared with measuring static parameters solely. We screened 2000 compounds in U-2 OS cells expressing Lamp1-GFP to label late endosomes. All hits have phenotypes in both static and kinetic parameters. However, we show that the kinetic parameters enable better discrimination of the mechanisms of action. Most of the compounds cause a decrease of motility of endosomes, but we identify several compounds that increase endosomal motility. In summary, we show that kinetic data help to better discriminate phenotypes and thereby obtain more subtle phenotypic clustering. © 2014 Society for Laboratory Automation and Screening.

  11. Automation and workflow considerations for embedding Digimarc Barcodes at scale

    NASA Astrophysics Data System (ADS)

    Rodriguez, Tony; Haaga, Don; Calhoon, Sean

    2015-03-01

    The Digimarc® Barcode is a digital watermark applied to packages and variable data labels that carries GS1 standard GTIN-14 data traditionally carried by a 1-D barcode. The Digimarc Barcode can be read with smartphones and imaging-based barcode readers commonly used in grocery and retail environments. Using smartphones, consumers can engage with products and retailers can materially increase the speed of check-out, increasing store margins and providing a better experience for shoppers. Internal testing has shown an average of 53% increase in scanning throughput, enabling 100's of millions of dollars in cost savings [1] for retailers when deployed at scale. To get to scale, the process of embedding a digital watermark must be automated and integrated within existing workflows. Creating the tools and processes to do so represents a new challenge for the watermarking community. This paper presents a description and an analysis of the workflow implemented by Digimarc to deploy the Digimarc Barcode at scale. An overview of the tools created and lessons learned during the introduction of technology to the market are provided.

  12. MetaNET--a web-accessible interactive platform for biological metabolic network analysis.

    PubMed

    Narang, Pankaj; Khan, Shawez; Hemrom, Anmol Jaywant; Lynn, Andrew Michael

    2014-01-01

    Metabolic reactions have been extensively studied and compiled over the last century. These have provided a theoretical base to implement models, simulations of which are used to identify drug targets and optimize metabolic throughput at a systemic level. While tools for the perturbation of metabolic networks are available, their applications are limited and restricted as they require varied dependencies and often a commercial platform for full functionality. We have developed MetaNET, an open source user-friendly platform-independent and web-accessible resource consisting of several pre-defined workflows for metabolic network analysis. MetaNET is a web-accessible platform that incorporates a range of functions which can be combined to produce different simulations related to metabolic networks. These include (i) optimization of an objective function for wild type strain, gene/catalyst/reaction knock-out/knock-down analysis using flux balance analysis. (ii) flux variability analysis (iii) chemical species participation (iv) cycles and extreme paths identification and (v) choke point reaction analysis to facilitate identification of potential drug targets. The platform is built using custom scripts along with the open-source Galaxy workflow and Systems Biology Research Tool as components. Pre-defined workflows are available for common processes, and an exhaustive list of over 50 functions are provided for user defined workflows. MetaNET, available at http://metanet.osdd.net , provides a user-friendly rich interface allowing the analysis of genome-scale metabolic networks under various genetic and environmental conditions. The framework permits the storage of previous results, the ability to repeat analysis and share results with other users over the internet as well as run different tools simultaneously using pre-defined workflows, and user-created custom workflows.

  13. Updates in metabolomics tools and resources: 2014-2015.

    PubMed

    Misra, Biswapriya B; van der Hooft, Justin J J

    2016-01-01

    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources--in the form of tools, software, and databases--is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Data-driven discovery of new Dirac semimetal materials

    NASA Astrophysics Data System (ADS)

    Yan, Qimin; Chen, Ru; Neaton, Jeffrey

    In recent years, a significant amount of materials property data from high-throughput computations based on density functional theory (DFT) and the application of database technologies have enabled the rise of data-driven materials discovery. In this work, we initiate the extension of the data-driven materials discovery framework to the realm of topological semimetal materials and to accelerate the discovery of novel Dirac semimetals. We implement current available and develop new workflows to data-mine the Materials Project database for novel Dirac semimetals with desirable band structures and symmetry protected topological properties. This data-driven effort relies on the successful development of several automatic data generation and analysis tools, including a workflow for the automatic identification of topological invariants and pattern recognition techniques to find specific features in a massive number of computed band structures. Utilizing this approach, we successfully identified more than 15 novel Dirac point and Dirac nodal line systems that have not been theoretically predicted or experimentally identified. This work is supported by the Materials Project Predictive Modeling Center through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231.

  15. An Intelligent Automation Platform for Rapid Bioprocess Design.

    PubMed

    Wu, Tianyi; Zhou, Yuhong

    2014-08-01

    Bioprocess development is very labor intensive, requiring many experiments to characterize each unit operation in the process sequence to achieve product safety and process efficiency. Recent advances in microscale biochemical engineering have led to automated experimentation. A process design workflow is implemented sequentially in which (1) a liquid-handling system performs high-throughput wet lab experiments, (2) standalone analysis devices detect the data, and (3) specific software is used for data analysis and experiment design given the user's inputs. We report an intelligent automation platform that integrates these three activities to enhance the efficiency of such a workflow. A multiagent intelligent architecture has been developed incorporating agent communication to perform the tasks automatically. The key contribution of this work is the automation of data analysis and experiment design and also the ability to generate scripts to run the experiments automatically, allowing the elimination of human involvement. A first-generation prototype has been established and demonstrated through lysozyme precipitation process design. All procedures in the case study have been fully automated through an intelligent automation platform. The realization of automated data analysis and experiment design, and automated script programming for experimental procedures has the potential to increase lab productivity. © 2013 Society for Laboratory Automation and Screening.

  16. The Impact of Normalization Methods on RNA-Seq Data Analysis

    PubMed Central

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  17. An Intelligent Automation Platform for Rapid Bioprocess Design

    PubMed Central

    Wu, Tianyi

    2014-01-01

    Bioprocess development is very labor intensive, requiring many experiments to characterize each unit operation in the process sequence to achieve product safety and process efficiency. Recent advances in microscale biochemical engineering have led to automated experimentation. A process design workflow is implemented sequentially in which (1) a liquid-handling system performs high-throughput wet lab experiments, (2) standalone analysis devices detect the data, and (3) specific software is used for data analysis and experiment design given the user’s inputs. We report an intelligent automation platform that integrates these three activities to enhance the efficiency of such a workflow. A multiagent intelligent architecture has been developed incorporating agent communication to perform the tasks automatically. The key contribution of this work is the automation of data analysis and experiment design and also the ability to generate scripts to run the experiments automatically, allowing the elimination of human involvement. A first-generation prototype has been established and demonstrated through lysozyme precipitation process design. All procedures in the case study have been fully automated through an intelligent automation platform. The realization of automated data analysis and experiment design, and automated script programming for experimental procedures has the potential to increase lab productivity. PMID:24088579

  18. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  19. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information.

    PubMed

    Yu, Kebing; Salomon, Arthur R

    2009-12-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.

  20. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  1. Microfluidics for Single-Cell Genetic Analysis

    PubMed Central

    Thompson, A. M.; Paguirigan, A. L.; Kreutz, J. E.; Radich, J. P.; Chiu, D. T.

    2014-01-01

    The ability to correlate single-cell genetic information to cellular phenotypes will provide the kind of detailed insight into human physiology and disease pathways that is not possible to infer from bulk cell analysis. Microfluidic technologies are attractive for single-cell manipulation due to precise handling and low risk of contamination. Additionally, microfluidic single-cell techniques can allow for high-throughput and detailed genetic analyses that increase accuracy and decreases reagent cost compared to bulk techniques. Incorporating these microfluidic platforms into research and clinical laboratory workflows can fill an unmet need in biology, delivering the highly accurate, highly informative data necessary to develop new therapies and monitor patient outcomes. In this perspective, we describe the current and potential future uses of microfluidics at all stages of single-cell genetic analysis, including cell enrichment and capture, single-cell compartmentalization and manipulation, and detection and analyses. PMID:24789374

  2. Localization-based super-resolution imaging meets high-content screening.

    PubMed

    Beghin, Anne; Kechkar, Adel; Butler, Corey; Levet, Florian; Cabillic, Marine; Rossier, Olivier; Giannone, Gregory; Galland, Rémi; Choquet, Daniel; Sibarita, Jean-Baptiste

    2017-12-01

    Single-molecule localization microscopy techniques have proven to be essential tools for quantitatively monitoring biological processes at unprecedented spatial resolution. However, these techniques are very low throughput and are not yet compatible with fully automated, multiparametric cellular assays. This shortcoming is primarily due to the huge amount of data generated during imaging and the lack of software for automation and dedicated data mining. We describe an automated quantitative single-molecule-based super-resolution methodology that operates in standard multiwell plates and uses analysis based on high-content screening and data-mining software. The workflow is compatible with fixed- and live-cell imaging and allows extraction of quantitative data like fluorophore photophysics, protein clustering or dynamic behavior of biomolecules. We demonstrate that the method is compatible with high-content screening using 3D dSTORM and DNA-PAINT based super-resolution microscopy as well as single-particle tracking.

  3. TAMEE: data management and analysis for tissue microarrays.

    PubMed

    Thallinger, Gerhard G; Baumgartner, Kerstin; Pirklbauer, Martin; Uray, Martina; Pauritsch, Elke; Mehes, Gabor; Buck, Charles R; Zatloukal, Kurt; Trajanoski, Zlatko

    2007-03-07

    With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceability and reproducibility of experiments and provision of results in a timely and reliable fashion. Robust and scalable applications have to be utilized, which allow secure data access, manipulation and evaluation for researchers from different laboratories. TAMEE (Tissue Array Management and Evaluation Environment) is a web-based database application for the management and analysis of data resulting from the production and application of TMAs. It facilitates storage of production and experimental parameters, of images generated throughout the TMA workflow, and of results from core evaluation. Database content consistency is achieved using structured classifications of parameters. This allows the extraction of high quality results for subsequent biologically-relevant data analyses. Tissue cores in the images of stained tissue sections are automatically located and extracted and can be evaluated using a set of predefined analysis algorithms. Additional evaluation algorithms can be easily integrated into the application via a plug-in interface. Downstream analysis of results is facilitated via a flexible query generator. We have developed an integrated system tailored to the specific needs of research projects using high density TMAs. It covers the complete workflow of TMA production, experimental use and subsequent analysis. The system is freely available for academic and non-profit institutions from http://genome.tugraz.at/Software/TAMEE.

  4. Computational databases, pathway and cheminformatics tools for tuberculosis drug discovery

    PubMed Central

    Ekins, Sean; Freundlich, Joel S.; Choi, Inhee; Sarker, Malabika; Talcott, Carolyn

    2010-01-01

    We are witnessing the growing menace of both increasing cases of drug-sensitive and drug-resistant Mycobacterium tuberculosis strains and the challenge to produce the first new tuberculosis (TB) drug in well over 40 years. The TB community, having invested in extensive high-throughput screening efforts, is faced with the question of how to optimally leverage this data in order to move from a hit to a lead to a clinical candidate and potentially a new drug. Complementing this approach, yet conducted on a much smaller scale, cheminformatic techniques have been leveraged and are herein reviewed. We suggest these computational approaches should be more optimally integrated in a workflow with experimental approaches to accelerate TB drug discovery. PMID:21129975

  5. Case study: impact of technology investment on lead discovery at Bristol-Myers Squibb, 1998-2006.

    PubMed

    Houston, John G; Banks, Martyn N; Binnie, Alastair; Brenner, Stephen; O'Connell, Jonathan; Petrillo, Edward W

    2008-01-01

    We review strategic approaches taken over an eight-year period at BMS to implement new high-throughput approaches to lead discovery. Investments in compound management infrastructure and chemistry library production capability allowed significant growth in the size, diversity and quality of the BMS compound collection. Screening platforms were upgraded with robust automated technology to support miniaturized assay formats, while workflows and information handling technologies were streamlined for improved performance. These technology changes drove the need for a supporting organization in which critical engineering, informatics and scientific skills were more strongly represented. Taken together, these investments led to significant improvements in speed and productivity as well a greater impact of screening campaigns on the initiation of new drug discovery programs.

  6. An open-source computational and data resource to analyze digital maps of immunopeptidomes

    PubMed Central

    Caron, Etienne; Espona, Lucia; Kowalewski, Daniel J; Schuster, Heiko; Ternette, Nicola; Alpízar, Adán; Schittenhelm, Ralf B; Ramarathinam, Sri H; Lindestam Arlehamn, Cecilia S; Chiek Koh, Ching; Gillet, Ludovic C; Rabsteyn, Armin; Navarro, Pedro; Kim, Sangtae; Lam, Henry; Sturm, Theo; Marcilla, Miguel; Sette, Alessandro; Campbell, David S; Deutsch, Eric W; Moritz, Robert L; Purcell, Anthony W; Rammensee, Hans-Georg; Stevanovic, Stefan; Aebersold, Ruedi

    2015-01-01

    We present a novel mass spectrometry-based high-throughput workflow and an open-source computational and data resource to reproducibly identify and quantify HLA-associated peptides. Collectively, the resources support the generation of HLA allele-specific peptide assay libraries consisting of consensus fragment ion spectra, and the analysis of quantitative digital maps of HLA peptidomes generated from a range of biological sources by SWATH mass spectrometry (MS). This study represents the first community-based effort to develop a robust platform for the reproducible and quantitative measurement of the entire repertoire of peptides presented by HLA molecules, an essential step towards the design of efficient immunotherapies. DOI: http://dx.doi.org/10.7554/eLife.07661.001 PMID:26154972

  7. Microfluidic-Mass Spectrometry Interfaces for Translational Proteomics.

    PubMed

    Pedde, R Daniel; Li, Huiyan; Borchers, Christoph H; Akbari, Mohsen

    2017-10-01

    Interfacing mass spectrometry (MS) with microfluidic chips (μchip-MS) holds considerable potential to transform a clinician's toolbox, providing translatable methods for the early detection, diagnosis, monitoring, and treatment of noncommunicable diseases by streamlining and integrating laborious sample preparation workflows on high-throughput, user-friendly platforms. Overcoming the limitations of competitive immunoassays - currently the gold standard in clinical proteomics - μchip-MS can provide unprecedented access to complex proteomic assays having high sensitivity and specificity, but without the labor, costs, and complexities associated with conventional MS sample processing. This review surveys recent μchip-MS systems for clinical applications and examines their emerging role in streamlining the development and translation of MS-based proteomic assays by alleviating many of the challenges that currently inhibit widespread clinical adoption. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  8. Advancing a High Throughput Glycotope-centric Glycomics Workflow Based on nanoLC-MS2-product Dependent-MS3 Analysis of Permethylated Glycans.

    PubMed

    Hsiao, Cheng-Te; Wang, Po-Wei; Chang, Hua-Chien; Chen, Yen-Ying; Wang, Shui-Hua; Chern, Yijuang; Khoo, Kay-Hooi

    2017-12-01

    The intrinsic nature of glycosylation, namely nontemplate encoded, stepwise elongation and termination with a diverse range of isomeric glyco-epitopes (glycotopes), translates into ambiguity in most cases of mass spectrometry (MS)-based glycomic mapping. It is arguable that whether one needs to delineate every single glycomic entity, which may be counterproductive. Instead, one should focus on identifying as many structural features as possible that would collectively define the glycomic characteristics of a cell or tissue, and how these may change in response to self-programmed development, immuno-activation, and malignant transformation. We have been pursuing this line of analytical strategy that homes in on identifying the terminal sulfo-, sialyl, and/or fucosylated glycotopes by comprehensive nanoLC-MS 2 -product dependent MS 3 analysis of permethylated glycans, in conjunction with development of a data mining computational tool, GlyPick, to enable an automated, high throughput, semi-quantitative glycotope-centric glycomic mapping amenable to even nonexperts. We demonstrate in this work that diagnostic MS 2 ions can be relied on to inform the presence of specific glycotopes, whereas their possible isomeric identities can be resolved at MS 3 level. Both MS 2 and associated MS 3 data can be acquired exhaustively and processed automatically by GlyPick. The high acquisition speed, resolution, and mass accuracy afforded by top-notch Orbitrap Fusion MS system now allow a sensible spectral count and/or summed ion intensity-based glycome-wide glycotope quantification. We report here the technical aspects, reproducibility and optimization of such an analytical approach that uses the same acidic reverse phase C18 nanoLC conditions fully compatible with proteomic analysis to allow rapid hassle-free switching. We further show how this workflow is particularly effective when applied to larger, multiply sialylated and fucosylated N-glycans derived from mouse brain. The complexity of their terminal glycotopes including variants of fucosylated and disialylated type 1 and 2 chains would otherwise not be adequately delineated by any conventional LC-MS/MS analysis. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. A fluorescence high throughput screening method for the detection of reactive electrophiles as potential skin sensitizers.

    PubMed

    Avonto, Cristina; Chittiboyina, Amar G; Rua, Diego; Khan, Ikhlas A

    2015-12-01

    Skin sensitization is an important toxicological end-point in the risk assessment of chemical allergens. Because of the complexity of the biological mechanisms associated with skin sensitization, integrated approaches combining different chemical, biological and in silico methods are recommended to replace conventional animal tests. Chemical methods are intended to characterize the potential of a sensitizer to induce earlier molecular initiating events. The presence of an electrophilic mechanistic domain is considered one of the essential chemical features to covalently bind to the biological target and induce further haptenation processes. Current in chemico assays rely on the quantification of unreacted model nucleophiles after incubation with the candidate sensitizer. In the current study, a new fluorescence-based method, 'HTS-DCYA assay', is proposed. The assay aims at the identification of reactive electrophiles based on their chemical reactivity toward a model fluorescent thiol. The reaction workflow enabled the development of a High Throughput Screening (HTS) method to directly quantify the reaction adducts. The reaction conditions have been optimized to minimize solubility issues, oxidative side reactions and increase the throughput of the assay while minimizing the reaction time, which are common issues with existing methods. Thirty-six chemicals previously classified with LLNA, DPRA or KeratinoSens™ were tested as a proof of concept. Preliminary results gave an estimated 82% accuracy, 78% sensitivity, 90% specificity, comparable to other in chemico methods such as Cys-DPRA. In addition to validated chemicals, six natural products were analyzed and a prediction of their sensitization potential is presented for the first time. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

    PubMed

    Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

    2011-08-30

    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.

  11. Progress on the FabrIc for Frontier Experiments project at Fermilab

    DOE PAGES

    Box, Dennis; Boyd, Joseph; Dykstra, Dave; ...

    2015-12-23

    The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercialmore » cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. Hence, the progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide« less

  12. The FIFE Project at Fermilab

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Box, D.; Boyd, J.; Di Benedetto, V.

    2016-01-01

    The FabrIc for Frontier Experiments (FIFE) project is an initiative within the Fermilab Scientific Computing Division designed to steer the computing model for non-LHC Fermilab experiments across multiple physics areas. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying size, needs, and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of solutions for high throughput computing, data management, database access and collaboration management within an experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid compute sites alongmore » with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including a common job submission service, software and reference data distribution through CVMFS repositories, flexible and robust data transfer clients, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken the leading role in defining the computing model for Fermilab experiments, aided in the design of experiments beyond those hosted at Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.« less

  13. Using CyberShake Workflows to Manage Big Seismic Hazard Data on Large-Scale Open-Science HPC Resources

    NASA Astrophysics Data System (ADS)

    Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

    2015-12-01

    The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and migrating from file-based communication to MPI messaging, to greatly reduce the I/O demands and node-hour requirements of CyberShake. We will also present performance metrics from CyberShake Study 15.4, and discuss challenges that producers of Big Data on open-science HPC resources face moving forward.

  14. MicroGen: a MIAME compliant web system for microarray experiment information and workflow management.

    PubMed

    Burgarella, Sarah; Cattaneo, Dario; Pinciroli, Francesco; Masseroli, Marco

    2005-12-01

    Improvements of bio-nano-technologies and biomolecular techniques have led to increasing production of high-throughput experimental data. Spotted cDNA microarray is one of the most diffuse technologies, used in single research laboratories and in biotechnology service facilities. Although they are routinely performed, spotted microarray experiments are complex procedures entailing several experimental steps and actors with different technical skills and roles. During an experiment, involved actors, who can also be located in a distance, need to access and share specific experiment information according to their roles. Furthermore, complete information describing all experimental steps must be orderly collected to allow subsequent correct interpretation of experimental results. We developed MicroGen, a web system for managing information and workflow in the production pipeline of spotted microarray experiments. It is constituted of a core multi-database system able to store all data completely characterizing different spotted microarray experiments according to the Minimum Information About Microarray Experiments (MIAME) standard, and of an intuitive and user-friendly web interface able to support the collaborative work required among multidisciplinary actors and roles involved in spotted microarray experiment production. MicroGen supports six types of user roles: the researcher who designs and requests the experiment, the spotting operator, the hybridisation operator, the image processing operator, the system administrator, and the generic public user who can access the unrestricted part of the system to get information about MicroGen services. MicroGen represents a MIAME compliant information system that enables managing workflow and supporting collaborative work in spotted microarray experiment production.

  15. Observing Clonal Dynamics across Spatiotemporal Axes: A Prelude to Quantitative Fitness Models for Cancer.

    PubMed

    McPherson, Andrew W; Chan, Fong Chun; Shah, Sohrab P

    2018-02-01

    The ability to accurately model evolutionary dynamics in cancer would allow for prediction of progression and response to therapy. As a prelude to quantitative understanding of evolutionary dynamics, researchers must gather observations of in vivo tumor evolution. High-throughput genome sequencing now provides the means to profile the mutational content of evolving tumor clones from patient biopsies. Together with the development of models of tumor evolution, reconstructing evolutionary histories of individual tumors generates hypotheses about the dynamics of evolution that produced the observed clones. In this review, we provide a brief overview of the concepts involved in predicting evolutionary histories, and provide a workflow based on bulk and targeted-genome sequencing. We then describe the application of this workflow to time series data obtained for transformed and progressed follicular lymphomas (FL), and contrast the observed evolutionary dynamics between these two subtypes. We next describe results from a spatial sampling study of high-grade serous (HGS) ovarian cancer, propose mechanisms of disease spread based on the observed clonal mixtures, and provide examples of diversification through subclonal acquisition of driver mutations and convergent evolution. Finally, we state implications of the techniques discussed in this review as a necessary but insufficient step on the path to predictive modelling of disease dynamics. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  16. Improved ligand geometries in crystallographic refinement using AFITT in PHENIX

    DOE PAGES

    Janowski, Pawel A.; Moriarty, Nigel W.; Kelley, Brian P.; ...

    2016-08-31

    Modern crystal structure refinement programs rely on geometry restraints to overcome the challenge of a low data-to-parameter ratio. While the classical Engh and Huber restraints work well for standard amino-acid residues, the chemical complexity of small-molecule ligands presents a particular challenge. Most current approaches either limit ligand restraints to those that can be readily described in the Crystallographic Information File (CIF) format, thus sacrificing chemical flexibility and energetic accuracy, or they employ protocols that substantially lengthen the refinement time, potentially hindering rapid automated refinement workflows.PHENIX–AFITTrefinement uses a full molecular-mechanics force field for user-selected small-molecule ligands during refinement, eliminating the potentiallymore » difficult problem of finding or generating high-quality geometry restraints. It is fully integrated with a standard refinement protocol and requires practically no additional steps from the user, making it ideal for high-throughput workflows.PHENIX–AFITTrefinements also handle multiple ligands in a single model, alternate conformations and covalently bound ligands. Here, the results of combiningAFITTand thePHENIXsoftware suite on a data set of 189 protein–ligand PDB structures are presented. Refinements usingPHENIX–AFITTsignificantly reduce ligand conformational energy and lead to improved geometries without detriment to the fit to the experimental data. Finally, for the data presented,PHENIX–AFITTrefinements result in more chemically accurate models for small-molecule ligands.« less

  17. ACToR Chemical Structure processing using Open Source ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d

  18. Recent development in software and automation tools for high-throughput discovery bioanalysis.

    PubMed

    Shou, Wilson Z; Zhang, Jun

    2012-05-01

    Bioanalysis with LC-MS/MS has been established as the method of choice for quantitative determination of drug candidates in biological matrices in drug discovery and development. The LC-MS/MS bioanalytical support for drug discovery, especially for early discovery, often requires high-throughput (HT) analysis of large numbers of samples (hundreds to thousands per day) generated from many structurally diverse compounds (tens to hundreds per day) with a very quick turnaround time, in order to provide important activity and liability data to move discovery projects forward. Another important consideration for discovery bioanalysis is its fit-for-purpose quality requirement depending on the particular experiments being conducted at this stage, and it is usually not as stringent as those required in bioanalysis supporting drug development. These aforementioned attributes of HT discovery bioanalysis made it an ideal candidate for using software and automation tools to eliminate manual steps, remove bottlenecks, improve efficiency and reduce turnaround time while maintaining adequate quality. In this article we will review various recent developments that facilitate automation of individual bioanalytical procedures, such as sample preparation, MS/MS method development, sample analysis and data review, as well as fully integrated software tools that manage the entire bioanalytical workflow in HT discovery bioanalysis. In addition, software tools supporting the emerging high-resolution accurate MS bioanalytical approach are also discussed.

  19. Global connectivity of hub residues in Oncoprotein structures encodes genetic factors dictating personalized drug response to targeted Cancer therapy

    PubMed Central

    Soundararajan, Venky; Aravamudan, Murali

    2014-01-01

    The efficacy and mechanisms of therapeutic action are largely described by atomic bonds and interactions local to drug binding sites. Here we introduce global connectivity analysis as a high-throughput computational assay of therapeutic action – inspired by the Google page rank algorithm that unearths most “globally connected” websites from the information-dense world wide web (WWW). We execute short timescale (30 ps) molecular dynamics simulations with high sampling frequency (0.01 ps), to identify amino acid residue hubs whose global connectivity dynamics are characteristic of the ligand or mutation associated with the target protein. We find that unexpected allosteric hubs – up to 20Å from the ATP binding site, but within 5Å of the phosphorylation site – encode the Gibbs free energy of inhibition (ΔGinhibition) for select protein kinase-targeted cancer therapeutics. We further find that clinically relevant somatic cancer mutations implicated in both drug resistance and personalized drug sensitivity can be predicted in a high-throughput fashion. Our results establish global connectivity analysis as a potent assay of protein functional modulation. This sets the stage for unearthing disease-causal exome mutations and motivates forecast of clinical drug response on a patient-by-patient basis. We suggest incorporation of structure-guided genetic inference assays into pharmaceutical and healthcare Oncology workflows. PMID:25465236

  20. End-to-end workflow for finite element analysis of tumor treating fields in glioblastomas

    NASA Astrophysics Data System (ADS)

    Timmons, Joshua J.; Lok, Edwin; San, Pyay; Bui, Kevin; Wong, Eric T.

    2017-11-01

    Tumor Treating Fields (TTFields) therapy is an approved modality of treatment for glioblastoma. Patient anatomy-based finite element analysis (FEA) has the potential to reveal not only how these fields affect tumor control but also how to improve efficacy. While the automated tools for segmentation speed up the generation of FEA models, multi-step manual corrections are required, including removal of disconnected voxels, incorporation of unsegmented structures and the addition of 36 electrodes plus gel layers matching the TTFields transducers. Existing approaches are also not scalable for the high throughput analysis of large patient volumes. A semi-automated workflow was developed to prepare FEA models for TTFields mapping in the human brain. Magnetic resonance imaging (MRI) pre-processing, segmentation, electrode and gel placement, and post-processing were all automated. The material properties of each tissue were applied to their corresponding mask in silico using COMSOL Multiphysics (COMSOL, Burlington, MA, USA). The fidelity of the segmentations with and without post-processing was compared against the full semi-automated segmentation workflow approach using Dice coefficient analysis. The average relative differences for the electric fields generated by COMSOL were calculated in addition to observed differences in electric field-volume histograms. Furthermore, the mesh file formats in MPHTXT and NASTRAN were also compared using the differences in the electric field-volume histogram. The Dice coefficient was less for auto-segmentation without versus auto-segmentation with post-processing, indicating convergence on a manually corrected model. An existent but marginal relative difference of electric field maps from models with manual correction versus those without was identified, and a clear advantage of using the NASTRAN mesh file format was found. The software and workflow outlined in this article may be used to accelerate the investigation of TTFields in glioblastoma patients by facilitating the creation of FEA models derived from patient MRI datasets.

  1. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  2. High-throughput methods for characterizing the mechanical properties of coatings

    NASA Astrophysics Data System (ADS)

    Siripirom, Chavanin

    The characterization of mechanical properties in a combinatorial and high-throughput workflow has been a bottleneck that reduced the speed of the materials development process. High-throughput characterization of the mechanical properties was applied in this research in order to reduce the amount of sample handling and to accelerate the output. A puncture tester was designed and built to evaluate the toughness of materials using an innovative template design coupled with automation. The test is in the form of a circular free-film indentation. A single template contains 12 samples which are tested in a rapid serial approach. Next, the operational principles of a novel parallel dynamic mechanical-thermal analysis instrument were analyzed in detail for potential sources of errors. The test uses a model of a circular bilayer fixed-edge plate deformation. A total of 96 samples can be analyzed simultaneously which provides a tremendous increase in efficiency compared with a conventional dynamic test. The modulus values determined by the system had considerable variation. The errors were observed and improvements to the system were made. A finite element analysis was used to analyze the accuracy given by the closed-form solution with respect to testing geometries, such as thicknesses of the samples. A good control of the thickness of the sample was proven to be crucial to the accuracy and precision of the output. Then, the attempt to correlate the high-throughput experiments and conventional coating testing methods was made. Automated nanoindentation in dynamic mode was found to provide information on the near-surface modulus and could potentially correlate with the pendulum hardness test using the loss tangent component. Lastly, surface characterization of stratified siloxane-polyurethane coatings was carried out with X-ray photoelectron spectroscopy, Rutherford backscattering spectroscopy, transmission electron microscopy, and nanoindentation. The siloxane component segregates to the surface during curing. The distribution of siloxane as a function of thickness into the sample showed differences depending on the formulation parameters. The coatings which had higher siloxane content near the surface were those coatings found to perform well in field tests.

  3. A web-based rapid assessment tool for production publishing solutions

    NASA Astrophysics Data System (ADS)

    Sun, Tong

    2010-02-01

    Solution assessment is a critical first-step in understanding and measuring the business process efficiency enabled by an integrated solution package. However, assessing the effectiveness of any solution is usually a very expensive and timeconsuming task which involves lots of domain knowledge, collecting and understanding the specific customer operational context, defining validation scenarios and estimating the expected performance and operational cost. This paper presents an intelligent web-based tool that can rapidly assess any given solution package for production publishing workflows via a simulation engine and create a report for various estimated performance metrics (e.g. throughput, turnaround time, resource utilization) and operational cost. By integrating the digital publishing workflow ontology and an activity based costing model with a Petri-net based workflow simulation engine, this web-based tool allows users to quickly evaluate any potential digital publishing solutions side-by-side within their desired operational contexts, and provides a low-cost and rapid assessment for organizations before committing any purchase. This tool also benefits the solution providers to shorten the sales cycles, establishing a trustworthy customer relationship and supplement the professional assessment services with a proven quantitative simulation and estimation technology.

  4. Leveraging advances in biology to design biomaterials

    NASA Astrophysics Data System (ADS)

    Darnell, Max; Mooney, David J.

    2017-12-01

    Biomaterials have dramatically increased in functionality and complexity, allowing unprecedented control over the cells that interact with them. From these engineering advances arises the prospect of improved biomaterial-based therapies, yet practical constraints favour simplicity. Tools from the biology community are enabling high-resolution and high-throughput bioassays that, if incorporated into a biomaterial design framework, could help achieve unprecedented functionality while minimizing the complexity of designs by identifying the most important material parameters and biological outputs. However, to avoid data explosions and to effectively match the information content of an assay with the goal of the experiment, material screens and bioassays must be arranged in specific ways. By borrowing methods to design experiments and workflows from the bioprocess engineering community, we outline a framework for the incorporation of next-generation bioassays into biomaterials design to effectively optimize function while minimizing complexity. This framework can inspire biomaterials designs that maximize functionality and translatability.

  5. OpenMS: a flexible open-source software platform for mass spectrometry data analysis.

    PubMed

    Röst, Hannes L; Sachsenberg, Timo; Aiche, Stephan; Bielow, Chris; Weisser, Hendrik; Aicheler, Fabian; Andreotti, Sandro; Ehrlich, Hans-Christian; Gutenbrunner, Petra; Kenar, Erhan; Liang, Xiao; Nahnsen, Sven; Nilse, Lars; Pfeuffer, Julianus; Rosenberger, George; Rurik, Marc; Schmitt, Uwe; Veit, Johannes; Walzer, Mathias; Wojnar, David; Wolski, Witold E; Schilling, Oliver; Choudhary, Jyoti S; Malmström, Lars; Aebersold, Ruedi; Reinert, Knut; Kohlbacher, Oliver

    2016-08-30

    High-resolution mass spectrometry (MS) has become an important tool in the life sciences, contributing to the diagnosis and understanding of human diseases, elucidating biomolecular structural information and characterizing cellular signaling networks. However, the rapid growth in the volume and complexity of MS data makes transparent, accurate and reproducible analysis difficult. We present OpenMS 2.0 (http://www.openms.de), a robust, open-source, cross-platform software specifically designed for the flexible and reproducible analysis of high-throughput MS data. The extensible OpenMS software implements common mass spectrometric data processing tasks through a well-defined application programming interface in C++ and Python and through standardized open data formats. OpenMS additionally provides a set of 185 tools and ready-made workflows for common mass spectrometric data processing tasks, which enable users to perform complex quantitative mass spectrometric analyses with ease.

  6. ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry

    PubMed Central

    2011-01-01

    Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234

  7. A Domain Analysis Model for eIRB Systems: Addressing the Weak Link in Clinical Research Informatics

    PubMed Central

    He, Shan; Narus, Scott P.; Facelli, Julio C.; Lau, Lee Min; Botkin, Jefferey R.; Hurdle, John F.

    2014-01-01

    Institutional Review Boards (IRBs) are a critical component of clinical research and can become a significant bottleneck due to the dramatic increase, in both volume and complexity of clinical research. Despite the interest in developing clinical research informatics (CRI) systems and supporting data standards to increase clinical research efficiency and interoperability, informatics research in the IRB domain has not attracted much attention in the scientific community. The lack of standardized and structured application forms across different IRBs causes inefficient and inconsistent proposal reviews and cumbersome workflows. These issues are even more prominent in multi-institutional clinical research that is rapidly becoming the norm. This paper proposes and evaluates a domain analysis model for electronic IRB (eIRB) systems, paving the way for streamlined clinical research workflow via integration with other CRI systems and improved IRB application throughput via computer-assisted decision support. PMID:24929181

  8. Processing MALDI mass spectra to improve mass spectral direct tissue analysis

    NASA Astrophysics Data System (ADS)

    Norris, Jeremy L.; Cornett, Dale S.; Mobley, James A.; Andersson, Malin; Seeley, Erin H.; Chaurand, Pierre; Caprioli, Richard M.

    2007-02-01

    Profiling and imaging biological specimens using MALDI mass spectrometry has significant potential to contribute to our understanding and diagnosis of disease. The technique is efficient and high-throughput providing a wealth of data about the biological state of the sample from a very simple and direct experiment. However, in order for these techniques to be put to use for clinical purposes, the approaches used to process and analyze the data must improve. This study examines some of the existing tools to baseline subtract, normalize, align, and remove spectral noise for MALDI data, comparing the advantages of each. A preferred workflow is presented that can be easily implemented for data in ASCII format. The advantages of using such an approach are discussed for both molecular profiling and imaging mass spectrometry.

  9. Analyzing microtomography data with Python and the scikit-image library.

    PubMed

    Gouillart, Emmanuelle; Nunez-Iglesias, Juan; van der Walt, Stéfan

    2017-01-01

    The exploration and processing of images is a vital aspect of the scientific workflows of many X-ray imaging modalities. Users require tools that combine interactivity, versatility, and performance. scikit-image is an open-source image processing toolkit for the Python language that supports a large variety of file formats and is compatible with 2D and 3D images. The toolkit exposes a simple programming interface, with thematic modules grouping functions according to their purpose, such as image restoration, segmentation, and measurements. scikit-image users benefit from a rich scientific Python ecosystem that contains many powerful libraries for tasks such as visualization or machine learning. scikit-image combines a gentle learning curve, versatile image processing capabilities, and the scalable performance required for the high-throughput analysis of X-ray imaging data.

  10. Optimized knock-in of point mutations in zebrafish using CRISPR/Cas9.

    PubMed

    Prykhozhij, Sergey V; Fuller, Charlotte; Steele, Shelby L; Veinotte, Chansey J; Razaghi, Babak; Robitaille, Johane M; McMaster, Christopher R; Shlien, Adam; Malkin, David; Berman, Jason N

    2018-06-14

    We have optimized point mutation knock-ins into zebrafish genomic sites using clustered regularly interspaced palindromic repeats (CRISPR)/Cas9 reagents and single-stranded oligodeoxynucleotides. The efficiency of knock-ins was assessed by a novel application of allele-specific polymerase chain reaction and confirmed by high-throughput sequencing. Anti-sense asymmetric oligo design was found to be the most successful optimization strategy. However, cut site proximity to the mutation and phosphorothioate oligo modifications also greatly improved knock-in efficiency. A previously unrecognized risk of off-target trans knock-ins was identified that we obviated through the development of a workflow for correct knock-in detection. Together these strategies greatly facilitate the study of human genetic diseases in zebrafish, with additional applicability to enhance CRISPR-based approaches in other animal model systems.

  11. Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study

    PubMed Central

    Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger

    2017-01-01

    Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300

  12. A high-throughput Sanger strategy for human mitochondrial genome sequencing

    PubMed Central

    2013-01-01

    Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507

  13. SWARM : a scientific workflow for supporting Bayesian approaches to improve metabolic models.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, X.; Stevens, R.; Mathematics and Computer Science

    2008-01-01

    With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM dealsmore » with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.« less

  14. PeptideDepot: Flexible Relational Database for Visual Analysis of Quantitative Proteomic Data and Integration of Existing Protein Information

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2010-01-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895

  15. A global sensitivity analysis approach for morphogenesis models.

    PubMed

    Boas, Sonja E M; Navarro Jimenez, Maria I; Merks, Roeland M H; Blom, Joke G

    2015-11-21

    Morphogenesis is a developmental process in which cells organize into shapes and patterns. Complex, non-linear and multi-factorial models with images as output are commonly used to study morphogenesis. It is difficult to understand the relation between the uncertainty in the input and the output of such 'black-box' models, giving rise to the need for sensitivity analysis tools. In this paper, we introduce a workflow for a global sensitivity analysis approach to study the impact of single parameters and the interactions between them on the output of morphogenesis models. To demonstrate the workflow, we used a published, well-studied model of vascular morphogenesis. The parameters of this cellular Potts model (CPM) represent cell properties and behaviors that drive the mechanisms of angiogenic sprouting. The global sensitivity analysis correctly identified the dominant parameters in the model, consistent with previous studies. Additionally, the analysis provided information on the relative impact of single parameters and of interactions between them. This is very relevant because interactions of parameters impede the experimental verification of the predicted effect of single parameters. The parameter interactions, although of low impact, provided also new insights in the mechanisms of in silico sprouting. Finally, the analysis indicated that the model could be reduced by one parameter. We propose global sensitivity analysis as an alternative approach to study the mechanisms of morphogenesis. Comparison of the ranking of the impact of the model parameters to knowledge derived from experimental data and from manipulation experiments can help to falsify models and to find the operand mechanisms in morphogenesis. The workflow is applicable to all 'black-box' models, including high-throughput in vitro models in which output measures are affected by a set of experimental perturbations.

  16. A Method for Label-Free, Differential Top-Down Proteomics.

    PubMed

    Ntai, Ioanna; Toby, Timothy K; LeDuc, Richard D; Kelleher, Neil L

    2016-01-01

    Biomarker discovery in the translational research has heavily relied on labeled and label-free quantitative bottom-up proteomics. Here, we describe a new approach to biomarker studies that utilizes high-throughput top-down proteomics and is the first to offer whole protein characterization and relative quantitation within the same experiment. Using yeast as a model, we report procedures for a label-free approach to quantify the relative abundance of intact proteins ranging from 0 to 30 kDa in two different states. In this chapter, we describe the integrated methodology for the large-scale profiling and quantitation of the intact proteome by liquid chromatography-mass spectrometry (LC-MS) without the need for metabolic or chemical labeling. This recent advance for quantitative top-down proteomics is best implemented with a robust and highly controlled sample preparation workflow before data acquisition on a high-resolution mass spectrometer, and the application of a hierarchical linear statistical model to account for the multiple levels of variance contained in quantitative proteomic comparisons of samples for basic and clinical research.

  17. Development of the automated circulating tumor cell recovery system with microcavity array.

    PubMed

    Negishi, Ryo; Hosokawa, Masahito; Nakamura, Seita; Kanbara, Hisashige; Kanetomo, Masafumi; Kikuhara, Yoshihito; Tanaka, Tsuyoshi; Matsunaga, Tadashi; Yoshino, Tomoko

    2015-05-15

    Circulating tumor cells (CTCs) are well recognized as useful biomarker for cancer diagnosis and potential target of drug discovery for metastatic cancer. Efficient and precise recovery of extremely low concentrations of CTCs from blood has been required to increase the detection sensitivity. Here, an automated system equipped with a microcavity array (MCA) was demonstrated for highly efficient and reproducible CTC recovery. The use of MCA allows selective recovery of cancer cells from whole blood on the basis of differences in size between tumor and blood cells. Intra- and inter-assays revealed that the automated system achieved high efficiency and reproducibility equal to the assay manually performed by well-trained operator. Under optimized assay workflow, the automated system allows efficient and precise cell recovery for non-small cell lung cancer cells spiked in whole blood. The automated CTC recovery system will contribute to high-throughput analysis in the further clinical studies on large cohort of cancer patients. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Automated phase mapping with AgileFD and its application to light absorber discovery in the V–Mn–Nb oxide system

    DOE PAGES

    Suram, Santosh K.; Xue, Yexiang; Bai, Junwen; ...

    2016-11-21

    Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial X-ray diffraction data sets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of X-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs’ phase rule into the algorithm, physically meaningful phase mapsmore » are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V–Mn–Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudoternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band gap energy of MnV 2O 6. Lastly, the open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery.« less

  19. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105

  20. PChopper: high throughput peptide prediction for MRM/SRM transition design.

    PubMed

    Afzal, Vackar; Huang, Jeffrey T-J; Atrih, Abdel; Crowther, Daniel J

    2011-08-15

    The use of selective reaction monitoring (SRM) based LC-MS/MS analysis for the quantification of phosphorylation stoichiometry has been rapidly increasing. At the same time, the number of sites that can be monitored in a single LC-MS/MS experiment is also increasing. The manual processes associated with running these experiments have highlighted the need for computational assistance to quickly design MRM/SRM candidates. PChopper has been developed to predict peptides that can be produced via enzymatic protein digest; this includes single enzyme digests, and combinations of enzymes. It also allows digests to be simulated in 'batch' mode and can combine information from these simulated digests to suggest the most appropriate enzyme(s) to use. PChopper also allows users to define the characteristic of their target peptides, and can automatically identify phosphorylation sites that may be of interest. Two application end points are available for interacting with the system; the first is a web based graphical tool, and the second is an API endpoint based on HTTP REST. Service oriented architecture was used to rapidly develop a system that can consume and expose several services. A graphical tool was built to provide an easy to follow workflow that allows scientists to quickly and easily identify the enzymes required to produce multiple peptides in parallel via enzymatic digests in a high throughput manner.

  1. Molecular detection of fungal pathogens in clinical specimens by 18S rDNA high-throughput screening in comparison to ITS PCR and culture.

    PubMed

    Wagner, K; Springer, B; Pires, V P; Keller, P M

    2018-05-03

    The rising incidence of invasive fungal infections and the expanding spectrum of fungal pathogens makes early and accurate identification of the causative pathogen a daunting task. Diagnostics using molecular markers enable rapid identification of fungi, offer new insights into infectious disease dynamics, and open new possibilities for infectious disease control and prevention. We performed a retrospective study using clinical specimens (N = 233) from patients with suspected fungal infection previously subjected to culture and/or internal transcribed spacer (ITS) PCR. We used these specimens to evaluate a high-throughput screening method for fungal detection using automated DNA extraction (QIASymphony), fungal ribosomal small subunit (18S) rDNA RT-PCR and amplicon sequencing. Fungal sequences were compared with sequences from the curated, commercially available SmartGene IDNS database for pathogen identification. Concordance between 18S rDNA RT-PCR and culture results was 91%, and congruence between 18S rDNA RT-PCR and ITS PCR results was 94%. In addition, 18S rDNA RT-PCR and Sanger sequencing detected fungal pathogens in culture negative (N = 13) and ITS PCR negative specimens (N = 12) from patients with a clinically confirmed fungal infection. Our results support the use of the 18S rDNA RT-PCR diagnostic workflow for rapid and accurate identification of fungal pathogens in clinical specimens.

  2. Identification of ER-000444793, a Cyclophilin D-independent inhibitor of mitochondrial permeability transition, using a high-throughput screen in cryopreserved mitochondria

    PubMed Central

    Briston, Thomas; Lewis, Sian; Koglin, Mumta; Mistry, Kavita; Shen, Yongchun; Hartopp, Naomi; Katsumata, Ryosuke; Fukumoto, Hironori; Duchen, Michael R.; Szabadkai, Gyorgy; Staddon, James M.; Roberts, Malcolm; Powney, Ben

    2016-01-01

    Growing evidence suggests persistent mitochondrial permeability transition pore (mPTP) opening is a key pathophysiological event in cell death underlying a variety of diseases. While it has long been clear the mPTP is a druggable target, current agents are limited by off-target effects and low therapeutic efficacy. Therefore identification and development of novel inhibitors is necessary. To rapidly screen large compound libraries for novel mPTP modulators, a method was exploited to cryopreserve large batches of functionally active mitochondria from cells and tissues. The cryopreserved mitochondria maintained respiratory coupling and ATP synthesis, Ca2+ uptake and transmembrane potential. A high-throughput screen (HTS), using an assay of Ca2+-induced mitochondrial swelling in the cryopreserved mitochondria identified ER-000444793, a potent inhibitor of mPTP opening. Further evaluation using assays of Ca2+-induced membrane depolarisation and Ca2+ retention capacity also indicated that ER-000444793 acted as an inhibitor of the mPTP. ER-000444793 neither affected cyclophilin D (CypD) enzymatic activity, nor displaced of CsA from CypD protein, suggesting a mechanism independent of CypD inhibition. Here we identified a novel, CypD-independent inhibitor of the mPTP. The screening approach and compound described provides a workflow and additional tool to aid the search for novel mPTP modulators and to help understand its molecular nature. PMID:27886240

  3. Automated phase mapping with AgileFD and its application to light absorber discovery in the V–Mn–Nb oxide system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Suram, Santosh K.; Xue, Yexiang; Bai, Junwen

    Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial X-ray diffraction data sets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of X-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs’ phase rule into the algorithm, physically meaningful phase mapsmore » are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V–Mn–Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudoternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band gap energy of MnV 2O 6. Lastly, the open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery.« less

  4. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

    PubMed Central

    2010-01-01

    Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148

  5. Language workbench user interfaces for data analysis

    PubMed Central

    Benson, Victoria M.

    2015-01-01

    Biological data analysis is frequently performed with command line software. While this practice provides considerable flexibility for computationally savy individuals, such as investigators trained in bioinformatics, this also creates a barrier to the widespread use of data analysis software by investigators trained as biologists and/or clinicians. Workflow systems such as Galaxy and Taverna have been developed to try and provide generic user interfaces that can wrap command line analysis software. These solutions are useful for problems that can be solved with workflows, and that do not require specialized user interfaces. However, some types of analyses can benefit from custom user interfaces. For instance, developing biomarker models from high-throughput data is a type of analysis that can be expressed more succinctly with specialized user interfaces. Here, we show how Language Workbench (LW) technology can be used to model the biomarker development and validation process. We developed a language that models the concepts of Dataset, Endpoint, Feature Selection Method and Classifier. These high-level language concepts map directly to abstractions that analysts who develop biomarker models are familiar with. We found that user interfaces developed in the Meta-Programming System (MPS) LW provide convenient means to configure a biomarker development project, to train models and view the validation statistics. We discuss several advantages of developing user interfaces for data analysis with a LW, including increased interface consistency, portability and extension by language composition. The language developed during this experiment is distributed as an MPS plugin (available at http://campagnelab.org/software/bdval-for-mps/). PMID:25755929

  6. An experience of the introduction of a blood bank automation system (Ortho AutoVue Innova) in a regional acute hospital.

    PubMed

    Cheng, Yuk Wah; Wilkinson, Jenny M

    2015-08-01

    This paper reports on an evaluation of the introduction of a blood bank automation system (Ortho AutoVue(®) Innova) in a hospital blood bank by considering the performance and workflow as compared with manual methods. The turnaround time was found to be 45% faster than the manual method. The concordance rate was found to be 100% for both ABO/Rh(D) typing and antibody screening in both of the systems and there was no significant difference in detection sensitivity for clinically significant antibodies. The Ortho AutoVue(®) Innova automated blood banking system streamlined the routine pre-transfusion testing in hospital blood bank with high throughput, equivalent sensitivity and reliability as compared with conventional manual method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. On-Line Electrochemical Reduction of Disulfide Bonds: Improved FTICR-CID and -ETD Coverage of Oxytocin and Hepcidin

    NASA Astrophysics Data System (ADS)

    Nicolardi, Simone; Giera, Martin; Kooijman, Pieter; Kraj, Agnieszka; Chervet, Jean-Pierre; Deelder, André M.; van der Burgt, Yuri E. M.

    2013-12-01

    Particularly in the field of middle- and top-down peptide and protein analysis, disulfide bridges can severely hinder fragmentation and thus impede sequence analysis (coverage). Here we present an on-line/electrochemistry/ESI-FTICR-MS approach, which was applied to the analysis of the primary structure of oxytocin, containing one disulfide bridge, and of hepcidin, containing four disulfide bridges. The presented workflow provided up to 80 % (on-line) conversion of disulfide bonds in both peptides. With minimal sample preparation, such reduction resulted in a higher number of peptide backbone cleavages upon CID or ETD fragmentation, and thus yielded improved sequence coverage. The cycle times, including electrode recovery, were rapid and, therefore, might very well be coupled with liquid chromatography for protein or peptide separation, which has great potential for high-throughput analysis.

  8. A Primer on Infectious Disease Bacterial Genomics

    PubMed Central

    Petkau, Aaron; Knox, Natalie; Graham, Morag; Van Domselaar, Gary

    2016-01-01

    SUMMARY The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects. PMID:28590251

  9. Progress in digital color workflow understanding in the International Color Consortium (ICC) Workflow WG

    NASA Astrophysics Data System (ADS)

    McCarthy, Ann

    2006-01-01

    The ICC Workflow WG serves as the bridge between ICC color management technologies and use of those technologies in real world color production applications. ICC color management is applicable to and is used in a wide range of color systems, from highly specialized digital cinema color special effects to high volume publications printing to home photography. The ICC Workflow WG works to align ICC technologies so that the color management needs of these diverse use case systems are addressed in an open, platform independent manner. This report provides a high level summary of the ICC Workflow WG objectives and work to date, focusing on the ways in which workflow can impact image quality and color systems performance. The 'ICC Workflow Primitives' and 'ICC Workflow Patterns and Dimensions' workflow models are covered in some detail. Consider the questions, "How much of dissatisfaction with color management today is the result of 'the wrong color transformation at the wrong time' and 'I can't get to the right conversion at the right point in my work process'?" Put another way, consider how image quality through a workflow can be negatively affected when the coordination and control level of the color management system is not sufficient.

  10. Workflow and maintenance characteristics of five automated laboratory instruments for the diagnosis of sexually transmitted infections.

    PubMed

    Ratnam, Sam; Jang, Dan; Gilchrist, Jodi; Smieja, Marek; Poirier, Andre; Hatchette, Todd; Flandin, Jean-Frederic; Chernesky, Max

    2014-07-01

    The choice of a suitable automated system for a diagnostic laboratory depends on various factors. Comparative workflow studies provide quantifiable and objective metrics to determine hands-on time during specimen handling and processing, reagent preparation, return visits and maintenance, and test turnaround time and throughput. Using objective time study techniques, workflow characteristics for processing 96 and 192 tests were determined on m2000 RealTime (Abbott Molecular), Viper XTR (Becton Dickinson), cobas 4800 (Roche Molecular Diagnostics), Tigris (Hologic Gen-Probe), and Panther (Hologic Gen-Probe) platforms using second-generation assays for Chlamydia trachomatis and Neisseria gonorrhoeae. A combination of operational and maintenance steps requiring manual labor showed that Panther had the shortest overall hands-on times and Viper XTR the longest. Both Panther and Tigris showed greater efficiency whether 96 or 192 tests were processed. Viper XTR and Panther had the shortest times to results and m2000 RealTime the longest. Sample preparation and loading time was the shortest for Panther and longest for cobas 4800. Mandatory return visits were required only for m2000 RealTime and cobas 4800 when 96 tests were processed, and both required substantially more hands-on time than the other systems due to increased numbers of return visits when 192 tests were processed. These results show that there are substantial differences in the amount of labor required to operate each system. Assay performance, instrumentation, testing capacity, workflow, maintenance, and reagent costs should be considered in choosing a system. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  11. An automated workflow for enhancing microbial bioprocess optimization on a novel microbioreactor platform

    PubMed Central

    2012-01-01

    Background High-throughput methods are widely-used for strain screening effectively resulting in binary information regarding high or low productivity. Nevertheless achieving quantitative and scalable parameters for fast bioprocess development is much more challenging, especially for heterologous protein production. Here, the nature of the foreign protein makes it impossible to predict the, e.g. best expression construct, secretion signal peptide, inductor concentration, induction time, temperature and substrate feed rate in fed-batch operation to name only a few. Therefore, a high number of systematic experiments are necessary to elucidate the best conditions for heterologous expression of each new protein of interest. Results To increase the throughput in bioprocess development, we used a microtiter plate based cultivation system (Biolector) which was fully integrated into a liquid-handling platform enclosed in laminar airflow housing. This automated cultivation platform was used for optimization of the secretory production of a cutinase from Fusarium solani pisi with Corynebacterium glutamicum. The online monitoring of biomass, dissolved oxygen and pH in each of the microtiter plate wells enables to trigger sampling or dosing events with the pipetting robot used for a reliable selection of best performing cutinase producers. In addition to this, further automated methods like media optimization and induction profiling were developed and validated. All biological and bioprocess parameters were exclusively optimized at microtiter plate scale and showed perfect scalable results to 1 L and 20 L stirred tank bioreactor scale. Conclusions The optimization of heterologous protein expression in microbial systems currently requires extensive testing of biological and bioprocess engineering parameters. This can be efficiently boosted by using a microtiter plate cultivation setup embedded into a liquid-handling system, providing more throughput by parallelization and automation. Due to improved statistics by replicate cultivations, automated downstream analysis, and scalable process information, this setup has superior performance compared to standard microtiter plate cultivation. PMID:23113930

  12. Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities.

    PubMed

    Quo, Chang F; Kaddi, Chanchala; Phan, John H; Zollanvari, Amin; Xu, Mingqing; Wang, May D; Alterovitz, Gil

    2012-07-01

    Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). 'Data-driven' approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while 'design-driven' approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to -omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top-down and bottom-up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.

  13. The LabTube - a novel microfluidic platform for assay automation in laboratory centrifuges.

    PubMed

    Kloke, A; Fiebach, A R; Zhang, S; Drechsel, L; Niekrawietz, S; Hoehl, M M; Kneusel, R; Panthel, K; Steigert, J; von Stetten, F; Zengerle, R; Paust, N

    2014-05-07

    Assay automation is the key for successful transformation of modern biotechnology into routine workflows. Yet, it requires considerable investment in processing devices and auxiliary infrastructure, which is not cost-efficient for laboratories with low or medium sample throughput or point-of-care testing. To close this gap, we present the LabTube platform, which is based on assay specific disposable cartridges for processing in laboratory centrifuges. LabTube cartridges comprise interfaces for sample loading and downstream applications and fluidic unit operations for release of prestored reagents, mixing, and solid phase extraction. Process control is achieved by a centrifugally-actuated ballpen mechanism. To demonstrate the workflow and functionality of the LabTube platform, we show two LabTube automated sample preparation assays from laboratory routines: DNA extractions from whole blood and purification of His-tagged proteins. Equal DNA and protein yields were observed compared to manual reference runs, while LabTube automation could significantly reduce the hands-on-time to one minute per extraction.

  14. Image analysis tools and emerging algorithms for expression proteomics

    PubMed Central

    English, Jane A.; Lisacek, Frederique; Morris, Jeffrey S.; Yang, Guang-Zhong; Dunn, Michael J.

    2012-01-01

    Since their origins in academic endeavours in the 1970s, computational analysis tools have matured into a number of established commercial packages that underpin research in expression proteomics. In this paper we describe the image analysis pipeline for the established 2-D Gel Electrophoresis (2-DE) technique of protein separation, and by first covering signal analysis for Mass Spectrometry (MS), we also explain the current image analysis workflow for the emerging high-throughput ‘shotgun’ proteomics platform of Liquid Chromatography coupled to MS (LC/MS). The bioinformatics challenges for both methods are illustrated and compared, whilst existing commercial and academic packages and their workflows are described from both a user’s and a technical perspective. Attention is given to the importance of sound statistical treatment of the resultant quantifications in the search for differential expression. Despite wide availability of proteomics software, a number of challenges have yet to be overcome regarding algorithm accuracy, objectivity and automation, generally due to deterministic spot-centric approaches that discard information early in the pipeline, propagating errors. We review recent advances in signal and image analysis algorithms in 2-DE, MS, LC/MS and Imaging MS. Particular attention is given to wavelet techniques, automated image-based alignment and differential analysis in 2-DE, Bayesian peak mixture models and functional mixed modelling in MS, and group-wise consensus alignment methods for LC/MS. PMID:21046614

  15. Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

    NASA Astrophysics Data System (ADS)

    Robson, Barry; Li, Jin; Dettinger, Richard; Peters, Amanda; Boyer, Stephen K.

    2011-05-01

    A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by "natural selection" as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept.

  16. Signalling maps in cancer research: construction and data analysis

    PubMed Central

    Kondratova, Maria; Sompairac, Nicolas; Barillot, Emmanuel; Zinovyev, Andrei

    2018-01-01

    Abstract Generation and usage of high-quality molecular signalling network maps can be augmented by standardizing notations, establishing curation workflows and application of computational biology methods to exploit the knowledge contained in the maps. In this manuscript, we summarize the major aims and challenges of assembling information in the form of comprehensive maps of molecular interactions. Mainly, we share our experience gained while creating the Atlas of Cancer Signalling Network. In the step-by-step procedure, we describe the map construction process and suggest solutions for map complexity management by introducing a hierarchical modular map structure. In addition, we describe the NaviCell platform, a computational technology using Google Maps API to explore comprehensive molecular maps similar to geographical maps and explain the advantages of semantic zooming principles for map navigation. We also provide the outline to prepare signalling network maps for navigation using the NaviCell platform. Finally, several examples of cancer high-throughput data analysis and visualization in the context of comprehensive signalling maps are presented. PMID:29688383

  17. A versatile pipeline for the multi-scale digital reconstruction and quantitative analysis of 3D tissue architecture

    PubMed Central

    Morales-Navarrete, Hernán; Segovia-Miranda, Fabián; Klukowski, Piotr; Meyer, Kirstin; Nonaka, Hidenori; Marsico, Giovanni; Chernykh, Mikhail; Kalaidzidis, Alexander; Zerial, Marino; Kalaidzidis, Yannis

    2015-01-01

    A prerequisite for the systems biology analysis of tissues is an accurate digital three-dimensional reconstruction of tissue structure based on images of markers covering multiple scales. Here, we designed a flexible pipeline for the multi-scale reconstruction and quantitative morphological analysis of tissue architecture from microscopy images. Our pipeline includes newly developed algorithms that address specific challenges of thick dense tissue reconstruction. Our implementation allows for a flexible workflow, scalable to high-throughput analysis and applicable to various mammalian tissues. We applied it to the analysis of liver tissue and extracted quantitative parameters of sinusoids, bile canaliculi and cell shapes, recognizing different liver cell types with high accuracy. Using our platform, we uncovered an unexpected zonation pattern of hepatocytes with different size, nuclei and DNA content, thus revealing new features of liver tissue organization. The pipeline also proved effective to analyse lung and kidney tissue, demonstrating its generality and robustness. DOI: http://dx.doi.org/10.7554/eLife.11214.001 PMID:26673893

  18. Bacterial cell identification in differential interference contrast microscopy images.

    PubMed

    Obara, Boguslaw; Roberts, Mark A J; Armitage, Judith P; Grau, Vicente

    2013-04-23

    Microscopy image segmentation lays the foundation for shape analysis, motion tracking, and classification of biological objects. Despite its importance, automated segmentation remains challenging for several widely used non-fluorescence, interference-based microscopy imaging modalities. For example in differential interference contrast microscopy which plays an important role in modern bacterial cell biology. Therefore, new revolutions in the field require the development of tools, technologies and work-flows to extract and exploit information from interference-based imaging data so as to achieve new fundamental biological insights and understanding. We have developed and evaluated a high-throughput image analysis and processing approach to detect and characterize bacterial cells and chemotaxis proteins. Its performance was evaluated using differential interference contrast and fluorescence microscopy images of Rhodobacter sphaeroides. Results demonstrate that the proposed approach provides a fast and robust method for detection and analysis of spatial relationship between bacterial cells and their chemotaxis proteins.

  19. Current and future perspectives on the development ...

    EPA Pesticide Factsheets

    Safety-related problems continue to be one of the major reasons of attrition in drug development. Non-testing approaches to predict toxicity could form part of the solution. This review provides a perspective of current status of non-testing approaches available for the prediction of different toxicity endpoints. A framework for the development, evaluation and assessment of (Q)SARs is presented together with several examples. A workflow for performing read-across predictions within category and analogue approaches is presented and the shortcomings discussed. In light of the advances in high throughput (HT) approaches and constructs such as adverse outcome pathways (AOPs) coming on-line to help in interpreting such HT data, the ways in which non-testing approaches are developed are also evolving. We discuss what the future of these approaches might look like and outline how their integration could be useful in screening toxicity for drug development. Invited review article for CRT for a special issue.

  20. Analysis of protein stability and ligand interactions by thermal shift assay.

    PubMed

    Huynh, Kathy; Partch, Carrie L

    2015-02-02

    Purification of recombinant proteins for biochemical assays and structural studies is time-consuming and presents inherent difficulties that depend on the optimization of protein stability. The use of dyes to monitor thermal denaturation of proteins with sensitive fluorescence detection enables rapid and inexpensive determination of protein stability using real-time PCR instruments. By screening a wide range of solution conditions and additives in a 96-well format, the thermal shift assay easily identifies conditions that significantly enhance the stability of recombinant proteins. The same approach can be used as an initial low-cost screen to discover new protein-ligand interactions by capitalizing on increases in protein stability that typically occur upon ligand binding. This unit presents a methodological workflow for small-scale, high-throughput thermal denaturation of recombinant proteins in the presence of SYPRO Orange dye. Copyright © 2015 John Wiley & Sons, Inc.

  1. Proteomics of Plant Pathogenic Fungi

    PubMed Central

    González-Fernández, Raquel; Prats, Elena; Jorrín-Novo, Jesús V.

    2010-01-01

    Plant pathogenic fungi cause important yield losses in crops. In order to develop efficient and environmental friendly crop protection strategies, molecular studies of the fungal biological cycle, virulence factors, and interaction with its host are necessary. For that reason, several approaches have been performed using both classical genetic, cell biology, and biochemistry and the modern, holistic, and high-throughput, omic techniques. This work briefly overviews the tools available for studying Plant Pathogenic Fungi and is amply focused on MS-based Proteomics analysis, based on original papers published up to December 2009. At a methodological level, different steps in a proteomic workflow experiment are discussed. Separate sections are devoted to fungal descriptive (intracellular, subcellular, extracellular) and differential expression proteomics and interactomics. From the work published we can conclude that Proteomics, in combination with other techniques, constitutes a powerful tool for providing important information about pathogenicity and virulence factors, thus opening up new possibilities for crop disease diagnosis and crop protection. PMID:20589070

  2. Proteomics of plant pathogenic fungi.

    PubMed

    González-Fernández, Raquel; Prats, Elena; Jorrín-Novo, Jesús V

    2010-01-01

    Plant pathogenic fungi cause important yield losses in crops. In order to develop efficient and environmental friendly crop protection strategies, molecular studies of the fungal biological cycle, virulence factors, and interaction with its host are necessary. For that reason, several approaches have been performed using both classical genetic, cell biology, and biochemistry and the modern, holistic, and high-throughput, omic techniques. This work briefly overviews the tools available for studying Plant Pathogenic Fungi and is amply focused on MS-based Proteomics analysis, based on original papers published up to December 2009. At a methodological level, different steps in a proteomic workflow experiment are discussed. Separate sections are devoted to fungal descriptive (intracellular, subcellular, extracellular) and differential expression proteomics and interactomics. From the work published we can conclude that Proteomics, in combination with other techniques, constitutes a powerful tool for providing important information about pathogenicity and virulence factors, thus opening up new possibilities for crop disease diagnosis and crop protection.

  3. Development of a Multiplexed Liquid Chromatography Multiple-Reaction-Monitoring Mass Spectrometry (LC-MRM/MS) Method for Evaluation of Salivary Proteins as Oral Cancer Biomarkers*

    PubMed Central

    Chen, Hsiao-Wei; Wu, Chun-Feng; Chu, Lichieh Julie; Chiang, Wei-Fang; Wu, Chih-Ching; Yu, Jau-Song; Tsai, Cheng-Han; Liang, Kung-Hao; Chang, Yu-Sun; Wu, Maureen; Ou Yang, Wei-Ting

    2017-01-01

    Multiple (selected) reaction monitoring (MRM/SRM) of peptides is a growing technology for target protein quantification because it is more robust, precise, accurate, high-throughput, and multiplex-capable than antibody-based techniques. The technique has been applied clinically to the large-scale quantification of multiple target proteins in different types of fluids. However, previous MRM-based studies have placed less focus on sample-preparation workflow and analytical performance in the precise quantification of proteins in saliva, a noninvasively sampled body fluid. In this study, we evaluated the analytical performance of a simple and robust multiple reaction monitoring (MRM)-based targeted proteomics approach incorporating liquid chromatography with mass spectrometry detection (LC-MRM/MS). This platform was used to quantitatively assess the biomarker potential of a group of 56 salivary proteins that have previously been associated with human cancers. To further enhance the development of this technology for assay of salivary samples, we optimized the workflow for salivary protein digestion and evaluated quantification performance, robustness and technical limitations in analyzing clinical samples. Using a clinically well-characterized cohort of two independent clinical sample sets (total n = 119), we quantitatively characterized these protein biomarker candidates in saliva specimens from controls and oral squamous cell carcinoma (OSCC) patients. The results clearly showed a significant elevation of most targeted proteins in saliva samples from OSCC patients compared with controls. Overall, this platform was capable of assaying the most highly multiplexed panel of salivary protein biomarkers, highlighting the clinical utility of MRM in oral cancer biomarker research. PMID:28235782

  4. Investigating performance variability of processing, exploitation, and dissemination using a socio-technical systems analysis approach

    NASA Astrophysics Data System (ADS)

    Danczyk, Jennifer; Wollocko, Arthur; Farry, Michael; Voshell, Martin

    2016-05-01

    Data collection processes supporting Intelligence, Surveillance, and Reconnaissance (ISR) missions have recently undergone a technological transition accomplished by investment in sensor platforms. Various agencies have made these investments to increase the resolution, duration, and quality of data collection, to provide more relevant and recent data to warfighters. However, while sensor improvements have increased the volume of high-resolution data, they often fail to improve situational awareness and actionable intelligence for the warfighter because it lacks efficient Processing, Exploitation, and Dissemination and filtering methods for mission-relevant information needs. The volume of collected ISR data often overwhelms manual and automated processes in modern analysis enterprises, resulting in underexploited data, insufficient, or lack of answers to information requests. The outcome is a significant breakdown in the analytical workflow. To cope with this data overload, many intelligence organizations have sought to re-organize their general staffing requirements and workflows to enhance team communication and coordination, with hopes of exploiting as much high-value data as possible and understanding the value of actionable intelligence well before its relevance has passed. Through this effort we have taken a scholarly approach to this problem by studying the evolution of Processing, Exploitation, and Dissemination, with a specific focus on the Army's most recent evolutions using the Functional Resonance Analysis Method. This method investigates socio-technical processes by analyzing their intended functions and aspects to determine performance variabilities. Gaps are identified and recommendations about force structure and future R and D priorities to increase the throughput of the intelligence enterprise are discussed.

  5. Evaluating multiplexed next-generation sequencing as a method in palynology for mixed pollen samples.

    PubMed

    Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I

    2015-03-01

    The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  6. Novel Phenotype-Genotype Correlations of Restrictive Cardiomyopathy With Myosin-Binding Protein C (MYBPC3) Gene Mutations Tested by Next-Generation Sequencing.

    PubMed

    Wu, Wei; Lu, Chao-Xia; Wang, Yi-Ning; Liu, Fang; Chen, Wei; Liu, Yong-Tai; Han, Ye-Chen; Cao, Jian; Zhang, Shu-Yang; Zhang, Xue

    2015-07-10

    MYBPC3 dysfunctions have been proven to induce dilated cardiomyopathy, hypertrophic cardiomyopathy, and/or left ventricular noncompaction; however, the genotype-phenotype correlation between MYBPC3 and restrictive cardiomyopathy (RCM) has not been established. The newly developed next-generation sequencing method is capable of broad genomic DNA sequencing with high throughput and can help explore novel correlations between genetic variants and cardiomyopathies. A proband from a multigenerational family with 3 live patients and 1 unrelated patient with clinical diagnoses of RCM underwent a next-generation sequencing workflow based on a custom AmpliSeq panel, including 64 candidate pathogenic genes for cardiomyopathies, on the Ion Personal Genome Machine high-throughput sequencing benchtop instrument. The selected panel contained a total of 64 genes that were reportedly associated with inherited cardiomyopathies. All patients fulfilled strict criteria for RCM with clinical characteristics, echocardiography, and/or cardiac magnetic resonance findings. The multigenerational family with 3 adult RCM patients carried an identical nonsense MYBPC3 mutation, and the unrelated patient carried a missense mutation in the MYBPC3 gene. All of these results were confirmed by the Sanger sequencing method. This study demonstrated that MYBPC3 gene mutations, revealed by next-generation sequencing, were associated with familial and sporadic RCM patients. It is suggested that the next-generation sequencing platform with a selected panel provides a highly efficient approach for molecular diagnosis of hereditary and idiopathic RCM and helps build new genotype-phenotype correlations. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.

  7. The Symbiotic Relationship between Scientific Workflow and Provenance (Invited)

    NASA Astrophysics Data System (ADS)

    Stephan, E.

    2010-12-01

    The purpose of this presentation is to describe the symbiotic nature of scientific workflows and provenance. We will also discuss the current trends and real world challenges facing these two distinct research areas. Although motivated differently, the needs of the international science communities are the glue that binds this relationship together. Understanding and articulating the science drivers to these communities is paramount as these technologies evolve and mature. Originally conceived for managing business processes, workflows are now becoming invaluable assets in both computational and experimental sciences. These reconfigurable, automated systems provide essential technology to perform complex analyses by coupling together geographically distributed disparate data sources and applications. As a result, workflows are capable of higher throughput in a shorter amount of time than performing the steps manually. Today many different workflow products exist; these could include Kepler and Taverna or similar products like MeDICI, developed at PNNL, that are standardized on the Business Process Execution Language (BPEL). Provenance, originating from the French term Provenir “to come from”, is used to describe the curation process of artwork as art is passed from owner to owner. The concept of provenance was adopted by digital libraries as a means to track the lineage of documents while standards such as the DublinCore began to emerge. In recent years the systems science community has increasingly expressed the need to expand the concept of provenance to formally articulate the history of scientific data. Communities such as the International Provenance and Annotation Workshop (IPAW) have formalized a provenance data model. The Open Provenance Model, and the W3C is hosting a provenance incubator group featuring the Proof Markup Language. Although both workflows and provenance have risen from different communities and operate independently, their mutual success is tied together, forming a symbiotic relationship where research and development advances in one effort can provide tremendous benefits to the other. For example, automating provenance extraction within scientific applications is still a relatively new concept; the workflow engine provides the framework to capture application specific operations, inputs, and resulting data. It provides a description of the process history and data flow by wrapping workflow components around the applications and data sources. On the other hand, a lack of cooperation between workflows and provenance can inhibit usefulness of both to science. Blindly tracking the execution history without having a true understanding of what kinds of questions end users may have makes the provenance indecipherable to the target users. Over the past nine years PNNL has been actively involved in provenance research in support of computational chemistry, molecular dynamics, biology, hydrology, and climate. PNNL has also been actively involved in efforts by the international community to develop open standards for provenance and the development of architectures to support provenance capture, storage, and querying. This presentation will provide real world use cases of how provenance and workflow can be leveraged and implemented to meet different needs and the challenges that lie ahead.

  8. Quantitative Proteomics of Sleep-Deprived Mouse Brains Reveals Global Changes in Mitochondrial Proteins

    PubMed Central

    Li, Tie-Mei; Zhang, Ju-en; Lin, Rui; Chen, She; Luo, Minmin; Dong, Meng-Qiu

    2016-01-01

    Sleep is a ubiquitous, tightly regulated, and evolutionarily conserved behavior observed in almost all animals. Prolonged sleep deprivation can be fatal, indicating that sleep is a physiological necessity. However, little is known about its core function. To gain insight into this mystery, we used advanced quantitative proteomics technology to survey the global changes in brain protein abundance. Aiming to gain a comprehensive profile, our proteomics workflow included filter-aided sample preparation (FASP), which increased the coverage of membrane proteins; tandem mass tag (TMT) labeling, for relative quantitation; and high resolution, high mass accuracy, high throughput mass spectrometry (MS). In total, we obtained the relative abundance ratios of 9888 proteins encoded by 6070 genes. Interestingly, we observed significant enrichment for mitochondrial proteins among the differentially expressed proteins. This finding suggests that sleep deprivation strongly affects signaling pathways that govern either energy metabolism or responses to mitochondrial stress. Additionally, the differentially-expressed proteins are enriched in pathways implicated in age-dependent neurodegenerative diseases, including Parkinson’s, Huntington’s, and Alzheimer’s, hinting at possible connections between sleep loss, mitochondrial stress, and neurodegeneration. PMID:27684481

  9. Scientist-Centered Workflow Abstractions via Generic Actors, Workflow Templates, and Context-Awareness for Groundwater Modeling and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chin, George; Sivaramakrishnan, Chandrika; Critchlow, Terence J.

    2011-07-04

    A drawback of existing scientific workflow systems is the lack of support to domain scientists in designing and executing their own scientific workflows. Many domain scientists avoid developing and using workflows because the basic objects of workflows are too low-level and high-level tools and mechanisms to aid in workflow construction and use are largely unavailable. In our research, we are prototyping higher-level abstractions and tools to better support scientists in their workflow activities. Specifically, we are developing generic actors that provide abstract interfaces to specific functionality, workflow templates that encapsulate workflow and data patterns that can be reused and adaptedmore » by scientists, and context-awareness mechanisms to gather contextual information from the workflow environment on behalf of the scientist. To evaluate these scientist-centered abstractions on real problems, we apply them to construct and execute scientific workflows in the specific domain area of groundwater modeling and analysis.« less

  10. On-Chip, Amplification-Free Quantification of Nucleic Acid for Point-of-Care Diagnosis

    NASA Astrophysics Data System (ADS)

    Yen, Tony Minghung

    This dissertation demonstrates three physical device concepts to overcome limitations in point-of-care quantification of nucleic acids. Enabling sensitive, high throughput nucleic acid quantification on a chip, outside of hospital and centralized laboratory setting, is crucial for improving pathogen detection and cancer diagnosis and prognosis. Among existing platforms, microarray have the advantages of being amplification free, low instrument cost, and high throughput, but are generally less sensitive compared to sequencing and PCR assays. To bridge this performance gap, this dissertation presents theoretical and experimental progress to develop a platform nucleic acid quantification technology that is drastically more sensitive than current microarrays while compatible with microarray architecture. The first device concept explores on-chip nucleic acid enrichment by natural evaporation of nucleic acid solution droplet. Using a micro-patterned super-hydrophobic black silicon array device, evaporative enrichment is coupled with nano-liter droplet self-assembly workflow to produce a 50 aM concentration sensitivity, 6 orders of dynamic range, and rapid hybridization time at under 5 minutes. The second device concept focuses on improving target copy number sensitivity, instead of concentration sensitivity. A comprehensive microarray physical model taking into account of molecular transport, electrostatic intermolecular interactions, and reaction kinetics is considered to guide device optimization. Device pattern size and target copy number are optimized based on model prediction to achieve maximal hybridization efficiency. At a 100-mum pattern size, a quantum leap in detection limit of 570 copies is achieved using black silicon array device with self-assembled pico-liter droplet workflow. Despite its merits, evaporative enrichment on black silicon device suffers from coffee-ring effect at 100-mum pattern size, and thus not compatible with clinical patient samples. The third device concept utilizes an integrated optomechanical laser system and a Cytop microarray device to reverse coffee-ring effect during evaporative enrichment at 100-mum pattern size. This method, named "laser-induced differential evaporation" is expected to enable 570 copies detection limit for clinical samples in near future. While the work is ongoing as of the writing of this dissertation, a clear research plan is in place to implement this method on microarray platform toward clinical sample testing for disease applications and future commercialization.

  11. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  12. Optimizing high performance computing workflow for protein functional annotation

    PubMed Central

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-01-01

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  13. Coupling of a continuum ice sheet model and a discrete element calving model using a scientific workflow system

    NASA Astrophysics Data System (ADS)

    Memon, Shahbaz; Vallot, Dorothée; Zwinger, Thomas; Neukirchen, Helmut

    2017-04-01

    Scientific communities generate complex simulations through orchestration of semi-structured analysis pipelines which involves execution of large workflows on multiple, distributed and heterogeneous computing and data resources. Modeling ice dynamics of glaciers requires workflows consisting of many non-trivial, computationally expensive processing tasks which are coupled to each other. From this domain, we present an e-Science use case, a workflow, which requires the execution of a continuum ice flow model and a discrete element based calving model in an iterative manner. Apart from the execution, this workflow also contains data format conversion tasks that support the execution of ice flow and calving by means of transition through sequential, nested and iterative steps. Thus, the management and monitoring of all the processing tasks including data management and transfer of the workflow model becomes more complex. From the implementation perspective, this workflow model was initially developed on a set of scripts using static data input and output references. In the course of application usage when more scripts or modifications introduced as per user requirements, the debugging and validation of results were more cumbersome to achieve. To address these problems, we identified a need to have a high-level scientific workflow tool through which all the above mentioned processes can be achieved in an efficient and usable manner. We decided to make use of the e-Science middleware UNICORE (Uniform Interface to Computing Resources) that allows seamless and automated access to different heterogenous and distributed resources which is supported by a scientific workflow engine. Based on this, we developed a high-level scientific workflow model for coupling of massively parallel High-Performance Computing (HPC) jobs: a continuum ice sheet model (Elmer/Ice) and a discrete element calving and crevassing model (HiDEM). In our talk we present how the use of a high-level scientific workflow middleware enables reproducibility of results more convenient and also provides a reusable and portable workflow template that can be deployed across different computing infrastructures. Acknowledgements This work was kindly supported by NordForsk as part of the Nordic Center of Excellence (NCoE) eSTICC (eScience Tools for Investigating Climate Change at High Northern Latitudes) and the Top-level Research Initiative NCoE SVALI (Stability and Variation of Arctic Land Ice).

  14. A practical workflow for making anatomical atlases for biological research.

    PubMed

    Wan, Yong; Lewis, A Kelsey; Colasanto, Mary; van Langeveld, Mark; Kardon, Gabrielle; Hansen, Charles

    2012-01-01

    The anatomical atlas has been at the intersection of science and art for centuries. These atlases are essential to biological research, but high-quality atlases are often scarce. Recent advances in imaging technology have made high-quality 3D atlases possible. However, until now there has been a lack of practical workflows using standard tools to generate atlases from images of biological samples. With certain adaptations, CG artists' workflow and tools, traditionally used in the film industry, are practical for building high-quality biological atlases. Researchers have developed a workflow for generating a 3D anatomical atlas using accessible artists' tools. They used this workflow to build a mouse limb atlas for studying the musculoskeletal system's development. This research aims to raise the awareness of using artists' tools in scientific research and promote interdisciplinary collaborations between artists and scientists. This video (http://youtu.be/g61C-nia9ms) demonstrates a workflow for creating an anatomical atlas.

  15. Isolation and characterization of circulating tumor cells using a novel workflow combining the CellSearch® system and the CellCelector™.

    PubMed

    Neumann, Martin Horst Dieter; Schneck, Helen; Decker, Yvonne; Schömer, Susanne; Franken, André; Endris, Volker; Pfarr, Nicole; Weichert, Wilko; Niederacher, Dieter; Fehm, Tanja; Neubauer, Hans

    2017-01-01

    Circulating tumor cells (CTC) are rare cells which have left the primary tumor to enter the blood stream. Although only a small CTC subgroup is capable of extravasating, the presence of CTCs is associated with an increased risk of metastasis and a shorter overall survival. Understanding the heterogeneous CTC biology will optimize treatment decisions and will thereby improve patient outcome. For this, robust workflows for detection and isolation of CTCs are urgently required. Here, we present a workflow to characterize CTCs by combining the advantages of both the CellSearch ® and the CellCelector™ micromanipulation system. CTCs were isolated from CellSearch ® cartridges using the CellCelector™ system and were deposited into PCR tubes for subsequent molecular analysis (whole genome amplification (WGA) and massive parallel multigene sequencing). By a CellCelector™ screen we reidentified 97% of CellSearch ® SKBR-3 cells. Furthermore, we isolated 97% of CellSearch ® -proven patient CTCs using the CellCelector™ system. Therein, we found an almost perfect correlation of R 2  = 0.98 (Spearman's rho correlation, n = 20, p < 0.00001) between the CellSearch ® CTC count (n = 271) and the CellCelector™ detected CTCs (n = 252). Isolated CTCs were analyzed by WGA and massive parallel multigene sequencing. In total, single nucleotide polymorphisms (SNPs) could be detected in 50 genes in seven CTCs, 12 MCF-7, and 3 T47D cells, respectively. Taken together, CTC quantification via the CellCelector™ system ensures a comprehensive detection of CTCs preidentified by the CellSearch ® system. Moreover, the isolation of CTCs after CellSearch ® using the CellCelector™ system guarantees for CTC enrichment without any contaminants enabling subsequent high throughput genomic analyses on single cell level. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 33:125-132, 2017. © 2016 American Institute of Chemical Engineers.

  16. Multi-modality molecular imaging: pre-clinical laboratory configuration

    NASA Astrophysics Data System (ADS)

    Wu, Yanjun; Wellen, Jeremy W.; Sarkar, Susanta K.

    2006-02-01

    In recent years, the prevalence of in vivo molecular imaging applications has rapidly increased. Here we report on the construction of a multi-modality imaging facility in a pharmaceutical setting that is expected to further advance existing capabilities for in vivo imaging of drug distribution and the interaction with their target. The imaging instrumentation in our facility includes a microPET scanner, a four wavelength time-domain optical imaging scanner, a 9.4T/30cm MRI scanner and a SPECT/X-ray CT scanner. An electronics shop and a computer room dedicated to image analysis are additional features of the facility. The layout of the facility was designed with a central animal preparation room surrounded by separate laboratory rooms for each of the major imaging modalities to accommodate the work-flow of simultaneous in vivo imaging experiments. This report will focus on the design of and anticipated applications for our microPET and optical imaging laboratory spaces. Additionally, we will discuss efforts to maximize the daily throughput of animal scans through development of efficient experimental work-flows and the use of multiple animals in a single scanning session.

  17. From the desktop to the grid: scalable bioinformatics via workflow conversion.

    PubMed

    de la Garza, Luis; Veit, Johannes; Szolek, Andras; Röttig, Marc; Aiche, Stephan; Gesing, Sandra; Reinert, Knut; Kohlbacher, Oliver

    2016-03-12

    Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.

  18. Corn and sorghum phenotyping using a fixed-wing UAV-based remote sensing system

    NASA Astrophysics Data System (ADS)

    Shi, Yeyin; Murray, Seth C.; Rooney, William L.; Valasek, John; Olsenholler, Jeff; Pugh, N. Ace; Henrickson, James; Bowden, Ezekiel; Zhang, Dongyan; Thomasson, J. Alex

    2016-05-01

    Recent development of unmanned aerial systems has created opportunities in automation of field-based high-throughput phenotyping by lowering flight operational cost and complexity and allowing flexible re-visit time and higher image resolution than satellite or manned airborne remote sensing. In this study, flights were conducted over corn and sorghum breeding trials in College Station, Texas, with a fixed-wing unmanned aerial vehicle (UAV) carrying two multispectral cameras and a high-resolution digital camera. The objectives were to establish the workflow and investigate the ability of UAV-based remote sensing for automating data collection of plant traits to develop genetic and physiological models. Most important among these traits were plant height and number of plants which are currently manually collected with high labor costs. Vegetation indices were calculated for each breeding cultivar from mosaicked and radiometrically calibrated multi-band imagery in order to be correlated with ground-measured plant heights, populations and yield across high genetic-diversity breeding cultivars. Growth curves were profiled with the aerial measured time-series height and vegetation index data. The next step of this study will be to investigate the correlations between aerial measurements and ground truth measured manually in field and from lab tests.

  19. Negative Electron Transfer Dissociation Sequencing of 3-O-Sulfation-Containing Heparan Sulfate Oligosaccharides

    NASA Astrophysics Data System (ADS)

    Wu, Jiandong; Wei, Juan; Hogan, John D.; Chopra, Pradeep; Joshi, Apoorva; Lu, Weigang; Klein, Joshua; Boons, Geert-Jan; Lin, Cheng; Zaia, Joseph

    2018-03-01

    Among dissociation methods, negative electron transfer dissociation (NETD) has been proven the most useful for glycosaminoglycan (GAG) sequencing because it produces informative fragmentation, a low degree of sulfate losses, high sensitivity, and translatability to multiple instrument types. The challenge, however, is to distinguish positional sulfation. In particular, NETD has been reported to fail to differentiate 4-O- versus 6-O-sulfation in chondroitin sulfate decasaccharide. This raised the concern of whether NETD is able to differentiate the rare 3-O-sulfation from predominant 6-O-sulfation in heparan sulfate (HS) oligosaccharides. Here, we report that NETD generates highly informative spectra that differentiate sites of O-sulfation on glucosamine residues, enabling structural characterizations of synthetic HS isomers containing 3-O-sulfation. Further, lyase-resistant 3-O-sulfated tetrasaccharides from natural sources were successfully sequenced. Notably, for all of the oligosaccharides in this study, the successful sequencing is based on NETD tandem mass spectra of commonly observed deprotonated precursor ions without derivatization or metal cation adduction, simplifying the experimental workflow and data interpretation. These results demonstrate the potential of NETD as a sensitive analytical tool for detailed, high-throughput structural analysis of highly sulfated GAGs. [Figure not available: see fulltext.

  20. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source

    PubMed Central

    Atwood, Robert C.; Bodey, Andrew J.; Price, Stephen W. T.; Basham, Mark; Drakopoulos, Michael

    2015-01-01

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an ‘orthogonal’ fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and ‘facility-independent’: it can run on standard cluster infrastructure at any institution. PMID:25939626

  1. Improving data workflow systems with cloud services and use of open data for bioinformatics research.

    PubMed

    Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich

    2017-04-16

    Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.

  2. Development of a Multiplexed Liquid Chromatography Multiple-Reaction-Monitoring Mass Spectrometry (LC-MRM/MS) Method for Evaluation of Salivary Proteins as Oral Cancer Biomarkers.

    PubMed

    Chen, Yi-Ting; Chen, Hsiao-Wei; Wu, Chun-Feng; Chu, Lichieh Julie; Chiang, Wei-Fang; Wu, Chih-Ching; Yu, Jau-Song; Tsai, Cheng-Han; Liang, Kung-Hao; Chang, Yu-Sun; Wu, Maureen; Ou Yang, Wei-Ting

    2017-05-01

    Multiple (selected) reaction monitoring (MRM/SRM) of peptides is a growing technology for target protein quantification because it is more robust, precise, accurate, high-throughput, and multiplex-capable than antibody-based techniques. The technique has been applied clinically to the large-scale quantification of multiple target proteins in different types of fluids. However, previous MRM-based studies have placed less focus on sample-preparation workflow and analytical performance in the precise quantification of proteins in saliva, a noninvasively sampled body fluid. In this study, we evaluated the analytical performance of a simple and robust multiple reaction monitoring (MRM)-based targeted proteomics approach incorporating liquid chromatography with mass spectrometry detection (LC-MRM/MS). This platform was used to quantitatively assess the biomarker potential of a group of 56 salivary proteins that have previously been associated with human cancers. To further enhance the development of this technology for assay of salivary samples, we optimized the workflow for salivary protein digestion and evaluated quantification performance, robustness and technical limitations in analyzing clinical samples. Using a clinically well-characterized cohort of two independent clinical sample sets (total n = 119), we quantitatively characterized these protein biomarker candidates in saliva specimens from controls and oral squamous cell carcinoma (OSCC) patients. The results clearly showed a significant elevation of most targeted proteins in saliva samples from OSCC patients compared with controls. Overall, this platform was capable of assaying the most highly multiplexed panel of salivary protein biomarkers, highlighting the clinical utility of MRM in oral cancer biomarker research. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Technology platform development for targeted plasma metabolites in human heart failure.

    PubMed

    Chan, Cy X'avia; Khan, Anjum A; Choi, Jh Howard; Ng, Cm Dominic; Cadeiras, Martin; Deng, Mario; Ping, Peipei

    2013-01-01

    Heart failure is a multifactorial disease associated with staggeringly high morbidity and motility. Recently, alterations of multiple metabolites have been implicated in heart failure; however, the lack of an effective technology platform to assess these metabolites has limited our understanding on how they contribute to this disease phenotype. We have successfully developed a new workflow combining specific sample preparation with tandem mass spectrometry that enables us to extract most of the targeted metabolites. 19 metabolites were chosen ascribing to their biological relevance to heart failure, including extracellular matrix remodeling, inflammation, insulin resistance, renal dysfunction, and cardioprotection against ischemic injury. In this report, we systematically engineered, optimized and refined a protocol applicable to human plasma samples; this study contributes to the methodology development with respect to deproteinization, incubation, reconstitution, and detection with mass spectrometry. The deproteinization step was optimized with 20% methanol/ethanol at a plasma:solvent ratio of 1:3. Subsequently, an incubation step was implemented which remarkably enhanced the metabolite signals and the number of metabolite peaks detected by mass spectrometry in both positive and negative modes. With respect to the step of reconstitution, 0.1% formic acid was designated as the reconstitution solvent vs. 6.5 mM ammonium bicarbonate, based on the comparable number of metabolite peaks detected in both solvents, and yet the signal detected in the former was higher. By adapting this finalized protocol, we were able to retrieve 13 out of 19 targeted metabolites from human plasma. We have successfully devised a simple albeit effective workflow for the targeted plasma metabolites relevant to human heart failure. This will be employed in tandem with high throughput liquid chromatography mass spectrometry platform to validate and characterize these potential metabolic biomarkers for diagnostic and therapeutic development of heart failure patients.

  4. A general concept for consistent documentation of computational analyses

    PubMed Central

    Müller, Fabian; Nordström, Karl; Lengauer, Thomas; Schulz, Marcel H.

    2015-01-01

    The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip PMID:26055099

  5. Encapsulating model complexity and landscape-scale analyses of state-and-transition simulation models: an application of ecoinformatics and juniper encroachment in sagebrush steppe ecosystems

    USGS Publications Warehouse

    O'Donnell, Michael

    2015-01-01

    State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf

  6. Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples.

    PubMed

    Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael

    2017-08-08

    Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.

  7. Immunoglobulin G (IgG) Fab glycosylation analysis using a new mass spectrometric high-throughput profiling method reveals pregnancy-associated changes.

    PubMed

    Bondt, Albert; Rombouts, Yoann; Selman, Maurice H J; Hensbergen, Paul J; Reiding, Karli R; Hazes, Johanna M W; Dolhain, Radboud J E M; Wuhrer, Manfred

    2014-11-01

    The N-linked glycosylation of the constant fragment (Fc) of immunoglobulin G has been shown to change during pathological and physiological events and to strongly influence antibody inflammatory properties. In contrast, little is known about Fab-linked N-glycosylation, carried by ∼ 20% of IgG. Here we present a high-throughput workflow to analyze Fab and Fc glycosylation of polyclonal IgG purified from 5 μl of serum. We were able to detect and quantify 37 different N-glycans by means of MALDI-TOF-MS analysis in reflectron positive mode using a novel linkage-specific derivatization of sialic acid. This method was applied to 174 samples of a pregnancy cohort to reveal Fab glycosylation features and their change with pregnancy. Data analysis revealed marked differences between Fab and Fc glycosylation, especially in the levels of galactosylation and sialylation, incidence of bisecting GlcNAc, and presence of high mannose structures, which were all higher in the Fab portion than the Fc, whereas Fc showed higher levels of fucosylation. Additionally, we observed several changes during pregnancy and after delivery. Fab N-glycan sialylation was increased and bisection was decreased relative to postpartum time points, and nearly complete galactosylation of Fab glycans was observed throughout. Fc glycosylation changes were similar to results described before, with increased galactosylation and sialylation and decreased bisection during pregnancy. We expect that the parallel analysis of IgG Fab and Fc, as set up in this paper, will be important for unraveling roles of these glycans in (auto)immunity, which may be mediated via recognition by human lectins or modulation of antigen binding. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  8. Immunoglobulin G (IgG) Fab Glycosylation Analysis Using a New Mass Spectrometric High-throughput Profiling Method Reveals Pregnancy-associated Changes*

    PubMed Central

    Bondt, Albert; Rombouts, Yoann; Selman, Maurice H. J.; Hensbergen, Paul J.; Reiding, Karli R.; Hazes, Johanna M. W.; Dolhain, Radboud J. E. M.; Wuhrer, Manfred

    2014-01-01

    The N-linked glycosylation of the constant fragment (Fc) of immunoglobulin G has been shown to change during pathological and physiological events and to strongly influence antibody inflammatory properties. In contrast, little is known about Fab-linked N-glycosylation, carried by ∼20% of IgG. Here we present a high-throughput workflow to analyze Fab and Fc glycosylation of polyclonal IgG purified from 5 μl of serum. We were able to detect and quantify 37 different N-glycans by means of MALDI-TOF-MS analysis in reflectron positive mode using a novel linkage-specific derivatization of sialic acid. This method was applied to 174 samples of a pregnancy cohort to reveal Fab glycosylation features and their change with pregnancy. Data analysis revealed marked differences between Fab and Fc glycosylation, especially in the levels of galactosylation and sialylation, incidence of bisecting GlcNAc, and presence of high mannose structures, which were all higher in the Fab portion than the Fc, whereas Fc showed higher levels of fucosylation. Additionally, we observed several changes during pregnancy and after delivery. Fab N-glycan sialylation was increased and bisection was decreased relative to postpartum time points, and nearly complete galactosylation of Fab glycans was observed throughout. Fc glycosylation changes were similar to results described before, with increased galactosylation and sialylation and decreased bisection during pregnancy. We expect that the parallel analysis of IgG Fab and Fc, as set up in this paper, will be important for unraveling roles of these glycans in (auto)immunity, which may be mediated via recognition by human lectins or modulation of antigen binding. PMID:25004930

  9. A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images.

    PubMed

    Miller, Nathan D; Haase, Nicholas J; Lee, Jonghyun; Kaeppler, Shawn M; de Leon, Natalia; Spalding, Edgar P

    2017-01-01

    Grain yield of the maize plant depends on the sizes, shapes, and numbers of ears and the kernels they bear. An automated pipeline that can measure these components of yield from easily-obtained digital images is needed to advance our understanding of this globally important crop. Here we present three custom algorithms designed to compute such yield components automatically from digital images acquired by a low-cost platform. One algorithm determines the average space each kernel occupies along the cob axis using a sliding-window Fourier transform analysis of image intensity features. A second counts individual kernels removed from ears, including those in clusters. A third measures each kernel's major and minor axis after a Bayesian analysis of contour points identifies the kernel tip. Dimensionless ear and kernel shape traits that may interrelate yield components are measured by principal components analysis of contour point sets. Increased objectivity and speed compared to typical manual methods are achieved without loss of accuracy as evidenced by high correlations with ground truth measurements and simulated data. Millimeter-scale differences among ear, cob, and kernel traits that ranged more than 2.5-fold across a diverse group of inbred maize lines were resolved. This system for measuring maize ear, cob, and kernel attributes is being used by multiple research groups as an automated Web service running on community high-throughput computing and distributed data storage infrastructure. Users may create their own workflow using the source code that is staged for download on a public repository. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  10. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    PubMed

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Managing and Communicating Operational Workflow: Designing and Implementing an Electronic Outpatient Whiteboard.

    PubMed

    Steitz, Bryan D; Weinberg, Stuart T; Danciu, Ioana; Unertl, Kim M

    2016-01-01

    Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings.

  12. Proposal for a common nomenclature for fragment ions in mass spectra of lipids

    PubMed Central

    Hartler, Jürgen; Christiansen, Klaus; Gallego, Sandra F.; Peng, Bing; Ahrends, Robert

    2017-01-01

    Advances in mass spectrometry-based lipidomics have in recent years prompted efforts to standardize the annotation of the vast number of lipid molecules that can be detected in biological systems. These efforts have focused on cataloguing, naming and drawing chemical structures of intact lipid molecules, but have provided no guidelines for annotation of lipid fragment ions detected using tandem and multi-stage mass spectrometry, albeit these fragment ions are mandatory for structural elucidation and high confidence lipid identification, especially in high throughput lipidomics workflows. Here we propose a nomenclature for the annotation of lipid fragment ions, describe its implementation and present a freely available web application, termed ALEX123 lipid calculator, that can be used to query a comprehensive database featuring curated lipid fragmentation information for more than 430,000 potential lipid molecules from 47 lipid classes covering five lipid categories. We note that the nomenclature is generic, extendable to stable isotope-labeled lipid molecules and applicable to automated annotation of fragment ions detected by most contemporary lipidomics platforms, including LC-MS/MS-based routines. PMID:29161304

  13. Proposal for a common nomenclature for fragment ions in mass spectra of lipids.

    PubMed

    Pauling, Josch K; Hermansson, Martin; Hartler, Jürgen; Christiansen, Klaus; Gallego, Sandra F; Peng, Bing; Ahrends, Robert; Ejsing, Christer S

    2017-01-01

    Advances in mass spectrometry-based lipidomics have in recent years prompted efforts to standardize the annotation of the vast number of lipid molecules that can be detected in biological systems. These efforts have focused on cataloguing, naming and drawing chemical structures of intact lipid molecules, but have provided no guidelines for annotation of lipid fragment ions detected using tandem and multi-stage mass spectrometry, albeit these fragment ions are mandatory for structural elucidation and high confidence lipid identification, especially in high throughput lipidomics workflows. Here we propose a nomenclature for the annotation of lipid fragment ions, describe its implementation and present a freely available web application, termed ALEX123 lipid calculator, that can be used to query a comprehensive database featuring curated lipid fragmentation information for more than 430,000 potential lipid molecules from 47 lipid classes covering five lipid categories. We note that the nomenclature is generic, extendable to stable isotope-labeled lipid molecules and applicable to automated annotation of fragment ions detected by most contemporary lipidomics platforms, including LC-MS/MS-based routines.

  14. Native Mass Spectrometry, Ion mobility, and Collision-Induced Unfolding Categorize Malaria Antigen/Antibody Binding

    NASA Astrophysics Data System (ADS)

    Huang, Yining; Salinas, Nichole D.; Chen, Edwin; Tolia, Niraj H.; Gross, Michael L.

    2017-09-01

    Plasmodium vivax Duffy Binding Protein (PvDBP) is a promising vaccine candidate for P. vivax malaria. Recently, we reported the epitopes on PvDBP region II (PvDBP-II) for three inhibitory monoclonal antibodies (2D10, 2H2, and 2C6). In this communication, we describe the combination of native mass spectrometry and ion mobility (IM) with collision induced unfolding (CIU) to study the conformation and stabilities of three malarial antigen-antibody complexes. These complexes, when collisionally activated, undergo conformational changes that depend on the location of the epitope. CIU patterns for PvDBP-II in complex with antibody 2D10 and 2H2 are highly similar, indicating comparable binding topology and stability. A different CIU fingerprint is observed for PvDBP-II/2C6, indicating that 2C6 binds to PvDBP-II on an epitope different from 2D10 and 2H2. This work supports the use of CIU as a means of classifying antigen-antibody complexes by their epitope maps in a high throughput screening workflow. [Figure not available: see fulltext.

  15. Workflow continuity--moving beyond business continuity in a multisite 24-7 healthcare organization.

    PubMed

    Kolowitz, Brian J; Lauro, Gonzalo Romero; Barkey, Charles; Black, Harry; Light, Karen; Deible, Christopher

    2012-12-01

    As hospitals move towards providing in-house 24 × 7 services, there is an increasing need for information systems to be available around the clock. This study investigates one organization's need for a workflow continuity solution that provides around the clock availability for information systems that do not provide highly available services. The organization investigated is a large multifacility healthcare organization that consists of 20 hospitals and more than 30 imaging centers. A case analysis approach was used to investigate the organization's efforts. The results show an overall reduction in downtimes where radiologists could not continue their normal workflow on the integrated Picture Archiving and Communications System (PACS) solution by 94 % from 2008 to 2011. The impact of unplanned downtimes was reduced by 72 % while the impact of planned downtimes was reduced by 99.66 % over the same period. Additionally more than 98 h of radiologist impact due to a PACS upgrade in 2008 was entirely eliminated in 2011 utilizing the system created by the workflow continuity approach. Workflow continuity differs from high availability and business continuity in its design process and available services. Workflow continuity only ensures that critical workflows are available when the production system is unavailable due to scheduled or unscheduled downtimes. Workflow continuity works in conjunction with business continuity and highly available system designs. The results of this investigation revealed that this approach can add significant value to organizations because impact on users is minimized if not eliminated entirely.

  16. Objectifying user critique. A means of continuous quality assurance for physician discharge letter composition.

    PubMed

    Oschem, M; Mahler, V; Prokosch, H U

    2011-01-01

    The aim of this study is to objectify user critique rendering it usable for quality assurance. Based on formative and summative evaluation results we strive to promote software improvements; in our case, the physician discharge letter composition process at the Department of Dermatology, University Hospital Erlangen, Germany. We developed a novel six-step approach to objectify user critique: 1) acquisition of user critique using subjectivist methods, 2) creation of a workflow model, 3) definition of hypothesis and indicators, 4) measuring of indicators, 5) analyzing results, 6) optimization of the system regarding both subjectivist and objectivist evaluation results. In particular, we derived indicators and workflows directly from user critique/narratives. The identified indicators were mapped onto workflow activities, creating a link between user critique and the evaluated system. Users criticized a new discharge letter system as "too slow" and "too labor-intensive" in comparison with the previously used system. In a stepwise approach we collected subjective user critique, derived a comprehensive process model including deviations and deduced a set of five indicators for objectivist evaluation: processing time, system-related waiting time, number of mouse clicks, number of keyboard inputs, and throughput time. About 3500 measurements have been performed to compare the workflow-steps of both systems, regarding 20 discharge letters. Although the difference of the mean total processing time between both systems was statistically insignificant (2011.7 s vs. 1971.5 s; p = 0.457), we detected a significant difference in waiting times (101.8 s vs. 37.2 s; p <0.001) and number of user interactions (77 vs. 69; p <0.001) in favor of the old system, thus objectifying user critique. Our six-step approach enables objectification of user critique, resulting in objective values for continuous quality assurance. To our knowledge no previous study in medical informatics mapped user critique onto workflow steps. Subjectivist analysis prompted us to use the indicator system-related waiting time for the objectivist study, which was rarely done before. We consider combining subjectivist and objectivist methods as a key point of our approach. Future work will concentrate on automated measurement of indicators.

  17. PANTHER. Pattern ANalytics To support High-performance Exploitation and Reasoning.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Czuchlewski, Kristina Rodriguez; Hart, William E.

    Sandia has approached the analysis of big datasets with an integrated methodology that uses computer science, image processing, and human factors to exploit critical patterns and relationships in large datasets despite the variety and rapidity of information. The work is part of a three-year LDRD Grand Challenge called PANTHER (Pattern ANalytics To support High-performance Exploitation and Reasoning). To maximize data analysis capability, Sandia pursued scientific advances across three key technical domains: (1) geospatial-temporal feature extraction via image segmentation and classification; (2) geospatial-temporal analysis capabilities tailored to identify and process new signatures more efficiently; and (3) domain- relevant models of humanmore » perception and cognition informing the design of analytic systems. Our integrated results include advances in geographical information systems (GIS) in which we discover activity patterns in noisy, spatial-temporal datasets using geospatial-temporal semantic graphs. We employed computational geometry and machine learning to allow us to extract and predict spatial-temporal patterns and outliers from large aircraft and maritime trajectory datasets. We automatically extracted static and ephemeral features from real, noisy synthetic aperture radar imagery for ingestion into a geospatial-temporal semantic graph. We worked with analysts and investigated analytic workflows to (1) determine how experiential knowledge evolves and is deployed in high-demand, high-throughput visual search workflows, and (2) better understand visual search performance and attention. Through PANTHER, Sandia's fundamental rethinking of key aspects of geospatial data analysis permits the extraction of much richer information from large amounts of data. The project results enable analysts to examine mountains of historical and current data that would otherwise go untouched, while also gaining meaningful, measurable, and defensible insights into overlooked relationships and patterns. The capability is directly relevant to the nation's nonproliferation remote-sensing activities and has broad national security applications for military and intelligence- gathering organizations.« less

  18. Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.

    PubMed

    Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A

    2014-05-01

    Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  19. Quantitative workflow based on NN for weighting criteria in landfill suitability mapping

    NASA Astrophysics Data System (ADS)

    Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Alkhasawneh, Mutasem Sh.; Aziz, Hamidi Abdul

    2017-10-01

    Our study aims to introduce a new quantitative workflow that integrates neural networks (NNs) and multi criteria decision analysis (MCDA). Existing MCDA workflows reveal a number of drawbacks, because of the reliance on human knowledge in the weighting stage. Thus, new workflow presented to form suitability maps at the regional scale for solid waste planning based on NNs. A feed-forward neural network employed in the workflow. A total of 34 criteria were pre-processed to establish the input dataset for NN modelling. The final learned network used to acquire the weights of the criteria. Accuracies of 95.2% and 93.2% achieved for the training dataset and testing dataset, respectively. The workflow was found to be capable of reducing human interference to generate highly reliable maps. The proposed workflow reveals the applicability of NN in generating landfill suitability maps and the feasibility of integrating them with existing MCDA workflows.

  20. An end-to-end workflow for engineering of biological networks from high-level specifications.

    PubMed

    Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

    2012-08-17

    We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

  1. High-volume workflow management in the ITN/FBI system

    NASA Astrophysics Data System (ADS)

    Paulson, Thomas L.

    1997-02-01

    The Identification Tasking and Networking (ITN) Federal Bureau of Investigation system will manage the processing of more than 70,000 submissions per day. The workflow manager controls the routing of each submission through a combination of automated and manual processing steps whose exact sequence is dynamically determined by the results at each step. For most submissions, one or more of the steps involve the visual comparison of fingerprint images. The ITN workflow manager is implemented within a scaleable client/server architecture. The paper describes the key aspects of the ITN workflow manager design which allow the high volume of daily processing to be successfully accomplished.

  2. Metabonomics and its role in amino acid nutrition research.

    PubMed

    He, Qinghua; Yin, Yulong; Zhao, Feng; Kong, Xiangfeng; Wu, Guoyao; Ren, Pingping

    2011-06-01

    Metabonomics combines metabolic profiling and multivariate data analysis to facilitate the high-throughput analysis of metabolites in biological samples. This technique has been developed as a powerful analytical tool and hence has found successful widespread applications in many areas of bioscience. Metabonomics has also become an important part of systems biology. As a sensitive and powerful method, metabonomics can quantitatively measure subtle dynamic perturbations of metabolic pathways in organisms due to changes in pathophysiological, nutritional, and epigenetic states. Therefore, metabonomics holds great promise to enhance our understanding of the complex relationship between amino acids and metabolism to define the roles for dietary amino acids in maintaining health and the development of disease. Such a technique also aids in the studies of functions, metabolic regulation, safety, and individualized requirements of amino acids. Here, we highlight the common workflow of metabonomics and some of the applications to amino acid nutrition research to illustrate the great potential of this exciting new frontier in bioscience.

  3. Proteomic analysis of formalin-fixed paraffin embedded tissue by MALDI imaging mass spectrometry

    PubMed Central

    Casadonte, Rita; Caprioli, Richard M

    2012-01-01

    Archived formalin-fixed paraffin-embedded (FFPE) tissue collections represent a valuable informational resource for proteomic studies. Multiple FFPE core biopsies can be assembled in a single block to form tissue microarrays (TMAs). We describe a protocol for analyzing protein in FFPE -TMAs using matrix-assisted laser desorption/ionization (MAL DI) imaging mass spectrometry (IMS). The workflow incorporates an antigen retrieval step following deparaffinization, in situ trypsin digestion, matrix application and then mass spectrometry signal acquisition. The direct analysis of FFPE -TMA tissue using IMS allows direct analysis of multiple tissue samples in a single experiment without extraction and purification of proteins. The advantages of high speed and throughput, easy sample handling and excellent reproducibility make this technology a favorable approach for the proteomic analysis of clinical research cohorts with large sample numbers. For example, TMA analysis of 300 FFPE cores would typically require 6 h of total time through data acquisition, not including data analysis. PMID:22011652

  4. Managing and Communicating Operational Workflow

    PubMed Central

    Weinberg, Stuart T.; Danciu, Ioana; Unertl, Kim M.

    2016-01-01

    Summary Background Healthcare team members in emergency department contexts have used electronic whiteboard solutions to help manage operational workflow for many years. Ambulatory clinic settings have highly complex operational workflow, but are still limited in electronic assistance to communicate and coordinate work activities. Objective To describe and discuss the design, implementation, use, and ongoing evolution of a coordination and collaboration tool supporting ambulatory clinic operational workflow at Vanderbilt University Medical Center (VUMC). Methods The outpatient whiteboard tool was initially designed to support healthcare work related to an electronic chemotherapy order-entry application. After a highly successful initial implementation in an oncology context, a high demand emerged across the organization for the outpatient whiteboard implementation. Over the past 10 years, developers have followed an iterative user-centered design process to evolve the tool. Results The electronic outpatient whiteboard system supports 194 separate whiteboards and is accessed by over 2800 distinct users on a typical day. Clinics can configure their whiteboards to support unique workflow elements. Since initial release, features such as immunization clinical decision support have been integrated into the system, based on requests from end users. Conclusions The success of the electronic outpatient whiteboard demonstrates the usefulness of an operational workflow tool within the ambulatory clinic setting. Operational workflow tools can play a significant role in supporting coordination, collaboration, and teamwork in ambulatory healthcare settings. PMID:27081407

  5. Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

    PubMed

    Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa

    2012-05-04

    Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

  6. Estimating differential expression from multiple indicators

    PubMed Central

    Ilmjärv, Sten; Hundahl, Christian Ansgar; Reimets, Riin; Niitsoo, Margus; Kolde, Raivo; Vilo, Jaak; Vasar, Eero; Luuk, Hendrik

    2014-01-01

    Regardless of the advent of high-throughput sequencing, microarrays remain central in current biomedical research. Conventional microarray analysis pipelines apply data reduction before the estimation of differential expression, which is likely to render the estimates susceptible to noise from signal summarization and reduce statistical power. We present a probe-level framework, which capitalizes on the high number of concurrent measurements to provide more robust differential expression estimates. The framework naturally extends to various experimental designs and target categories (e.g. transcripts, genes, genomic regions) as well as small sample sizes. Benchmarking in relation to popular microarray and RNA-sequencing data-analysis pipelines indicated high and stable performance on the Microarray Quality Control dataset and in a cell-culture model of hypoxia. Experimental-data-exhibiting long-range epigenetic silencing of gene expression was used to demonstrate the efficacy of detecting differential expression of genomic regions, a level of analysis not embraced by conventional workflows. Finally, we designed and conducted an experiment to identify hypothermia-responsive genes in terms of monotonic time-response. As a novel insight, hypothermia-dependent up-regulation of multiple genes of two major antioxidant pathways was identified and verified by quantitative real-time PCR. PMID:24586062

  7. A scientific workflow framework for (13)C metabolic flux analysis.

    PubMed

    Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina

    2016-08-20

    Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Using EHR audit trail logs to analyze clinical workflow: A case study from community-based ambulatory clinics.

    PubMed

    Wu, Danny T Y; Smart, Nikolas; Ciemins, Elizabeth L; Lanham, Holly J; Lindberg, Curt; Zheng, Kai

    2017-01-01

    To develop a workflow-supported clinical documentation system, it is a critical first step to understand clinical workflow. While Time and Motion studies has been regarded as the gold standard of workflow analysis, this method can be resource consuming and its data may be biased due to the cognitive limitation of human observers. In this study, we aimed to evaluate the feasibility and validity of using EHR audit trail logs to analyze clinical workflow. Specifically, we compared three known workflow changes from our previous study with the corresponding EHR audit trail logs of the study participants. The results showed that EHR audit trail logs can be a valid source for clinical workflow analysis, and can provide an objective view of clinicians' behaviors, multi-dimensional comparisons, and a highly extensible analysis framework.

  9. Disruption of Radiologist Workflow.

    PubMed

    Kansagra, Akash P; Liu, Kevin; Yu, John-Paul J

    2016-01-01

    The effect of disruptions has been studied extensively in surgery and emergency medicine, and a number of solutions-such as preoperative checklists-have been implemented to enforce the integrity of critical safety-related workflows. Disruptions of the highly complex and cognitively demanding workflow of modern clinical radiology have only recently attracted attention as a potential safety hazard. In this article, we describe the variety of disruptions that arise in the reading room environment, review approaches that other specialties have taken to mitigate workflow disruption, and suggest possible solutions for workflow improvement in radiology. Copyright © 2015 Mosby, Inc. All rights reserved.

  10. Imaging industry expectations for compressed sensing in MRI

    NASA Astrophysics Data System (ADS)

    King, Kevin F.; Kanwischer, Adriana; Peters, Rob

    2015-09-01

    Compressed sensing requires compressible data, incoherent acquisition and a nonlinear reconstruction algorithm to force creation of a compressible image consistent with the acquired data. MRI images are compressible using various transforms (commonly total variation or wavelets). Incoherent acquisition of MRI data by appropriate selection of pseudo-random or non-Cartesian locations in k-space is straightforward. Increasingly, commercial scanners are sold with enough computing power to enable iterative reconstruction in reasonable times. Therefore integration of compressed sensing into commercial MRI products and clinical practice is beginning. MRI frequently requires the tradeoff of spatial resolution, temporal resolution and volume of spatial coverage to obtain reasonable scan times. Compressed sensing improves scan efficiency and reduces the need for this tradeoff. Benefits to the user will include shorter scans, greater patient comfort, better image quality, more contrast types per patient slot, the enabling of previously impractical applications, and higher throughput. Challenges to vendors include deciding which applications to prioritize, guaranteeing diagnostic image quality, maintaining acceptable usability and workflow, and acquisition and reconstruction algorithm details. Application choice depends on which customer needs the vendor wants to address. The changing healthcare environment is putting cost and productivity pressure on healthcare providers. The improved scan efficiency of compressed sensing can help alleviate some of this pressure. Image quality is strongly influenced by image compressibility and acceleration factor, which must be appropriately limited. Usability and workflow concerns include reconstruction time and user interface friendliness and response. Reconstruction times are limited to about one minute for acceptable workflow. The user interface should be designed to optimize workflow and minimize additional customer training. Algorithm concerns include the decision of which algorithms to implement as well as the problem of optimal setting of adjustable parameters. It will take imaging vendors several years to work through these challenges and provide solutions for a wide range of applications.

  11. Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform.

    PubMed

    Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N

    2017-03-01

    High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.

  12. Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

    PubMed Central

    Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.

    2016-01-01

    High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692

  13. Quantifying Golgi structure using EM: combining volume-SEM and stereology for higher throughput.

    PubMed

    Ferguson, Sophie; Steyer, Anna M; Mayhew, Terry M; Schwab, Yannick; Lucocq, John Milton

    2017-06-01

    Investigating organelles such as the Golgi complex depends increasingly on high-throughput quantitative morphological analyses from multiple experimental or genetic conditions. Light microscopy (LM) has been an effective tool for screening but fails to reveal fine details of Golgi structures such as vesicles, tubules and cisternae. Electron microscopy (EM) has sufficient resolution but traditional transmission EM (TEM) methods are slow and inefficient. Newer volume scanning EM (volume-SEM) methods now have the potential to speed up 3D analysis by automated sectioning and imaging. However, they produce large arrays of sections and/or images, which require labour-intensive 3D reconstruction for quantitation on limited cell numbers. Here, we show that the information storage, digital waste and workload involved in using volume-SEM can be reduced substantially using sampling-based stereology. Using the Golgi as an example, we describe how Golgi populations can be sensed quantitatively using single random slices and how accurate quantitative structural data on Golgi organelles of individual cells can be obtained using only 5-10 sections/images taken from a volume-SEM series (thereby sensing population parameters and cell-cell variability). The approach will be useful in techniques such as correlative LM and EM (CLEM) where small samples of cells are treated and where there may be variable responses. For Golgi study, we outline a series of stereological estimators that are suited to these analyses and suggest workflows, which have the potential to enhance the speed and relevance of data acquisition in volume-SEM.

  14. An 18S rRNA Workflow for Characterizing Protists in Sewage, with a Focus on Zoonotic Trichomonads.

    PubMed

    Maritz, Julia M; Rogers, Krysta H; Rock, Tara M; Liu, Nicole; Joseph, Susan; Land, Kirkwood M; Carlton, Jane M

    2017-11-01

    Microbial eukaryotes (protists) are important components of terrestrial and aquatic environments, as well as animal and human microbiomes. Their relationships with metazoa range from mutualistic to parasitic and zoonotic (i.e., transmissible between humans and animals). Despite their ecological importance, our knowledge of protists in urban environments lags behind that of bacteria, largely due to a lack of experimentally validated high-throughput protocols that produce accurate estimates of protist diversity while minimizing non-protist DNA representation. We optimized protocols for detecting zoonotic protists in raw sewage samples, with a focus on trichomonad taxa. First, we investigated the utility of two commonly used variable regions of the 18S rRNA marker gene, V4 and V9, by amplifying and Sanger sequencing 23 different eukaryotic species, including 16 protist species such as Cryptosporidium parvum, Giardia intestinalis, Toxoplasma gondii, and species of trichomonad. Next, we optimized wet-lab methods for sample processing and Illumina sequencing of both regions from raw sewage collected from a private apartment building in New York City. Our results show that both regions are effective at identifying several zoonotic protists that may be present in sewage. A combination of small extractions (1 mL volumes) performed on the same day as sample collection, and the incorporation of a vertebrate blocking primer, is ideal to detect protist taxa of interest and combat the effects of metazoan DNA. We expect that the robust, standardized methods presented in our workflow will be applicable to investigations of protists in other environmental samples, and will help facilitate large-scale investigations of protistan diversity.

  15. Neurophysiological analytics for all! Free open-source software tools for documenting, analyzing, visualizing, and sharing using electronic notebooks.

    PubMed

    Rosenberg, David M; Horn, Charles C

    2016-08-01

    Neurophysiology requires an extensive workflow of information analysis routines, which often includes incompatible proprietary software, introducing limitations based on financial costs, transfer of data between platforms, and the ability to share. An ecosystem of free open-source software exists to fill these gaps, including thousands of analysis and plotting packages written in Python and R, which can be implemented in a sharable and reproducible format, such as the Jupyter electronic notebook. This tool chain can largely replace current routines by importing data, producing analyses, and generating publication-quality graphics. An electronic notebook like Jupyter allows these analyses, along with documentation of procedures, to display locally or remotely in an internet browser, which can be saved as an HTML, PDF, or other file format for sharing with team members and the scientific community. The present report illustrates these methods using data from electrophysiological recordings of the musk shrew vagus-a model system to investigate gut-brain communication, for example, in cancer chemotherapy-induced emesis. We show methods for spike sorting (including statistical validation), spike train analysis, and analysis of compound action potentials in notebooks. Raw data and code are available from notebooks in data supplements or from an executable online version, which replicates all analyses without installing software-an implementation of reproducible research. This demonstrates the promise of combining disparate analyses into one platform, along with the ease of sharing this work. In an age of diverse, high-throughput computational workflows, this methodology can increase efficiency, transparency, and the collaborative potential of neurophysiological research. Copyright © 2016 the American Physiological Society.

  16. Neurophysiological analytics for all! Free open-source software tools for documenting, analyzing, visualizing, and sharing using electronic notebooks

    PubMed Central

    2016-01-01

    Neurophysiology requires an extensive workflow of information analysis routines, which often includes incompatible proprietary software, introducing limitations based on financial costs, transfer of data between platforms, and the ability to share. An ecosystem of free open-source software exists to fill these gaps, including thousands of analysis and plotting packages written in Python and R, which can be implemented in a sharable and reproducible format, such as the Jupyter electronic notebook. This tool chain can largely replace current routines by importing data, producing analyses, and generating publication-quality graphics. An electronic notebook like Jupyter allows these analyses, along with documentation of procedures, to display locally or remotely in an internet browser, which can be saved as an HTML, PDF, or other file format for sharing with team members and the scientific community. The present report illustrates these methods using data from electrophysiological recordings of the musk shrew vagus—a model system to investigate gut-brain communication, for example, in cancer chemotherapy-induced emesis. We show methods for spike sorting (including statistical validation), spike train analysis, and analysis of compound action potentials in notebooks. Raw data and code are available from notebooks in data supplements or from an executable online version, which replicates all analyses without installing software—an implementation of reproducible research. This demonstrates the promise of combining disparate analyses into one platform, along with the ease of sharing this work. In an age of diverse, high-throughput computational workflows, this methodology can increase efficiency, transparency, and the collaborative potential of neurophysiological research. PMID:27098025

  17. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

    PubMed

    Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M

    2015-04-01

    High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.

  18. Strategy to Identify and Test Putative Light-Sensitive Non-Opsin G-Protein-Coupled Receptors: A Case Study.

    PubMed

    Faggionato, Davide; Serb, Jeanne M

    2017-08-01

    The rise of high-throughput RNA sequencing (RNA-seq) and de novo transcriptome assembly has had a transformative impact on how we identify and study genes in the phototransduction cascade of non-model organisms. But the advantage provided by the nearly automated annotation of RNA-seq transcriptomes may at the same time hinder the possibility for gene discovery and the discovery of new gene functions. For example, standard functional annotation based on domain homology to known protein families can only confirm group membership, not identify the emergence of new biochemical function. In this study, we show the importance of developing a strategy that circumvents the limitations of semiautomated annotation and apply this workflow to photosensitivity as a means to discover non-opsin photoreceptors. We hypothesize that non-opsin G-protein-coupled receptor (GPCR) proteins may have chromophore-binding lysines in locations that differ from opsin. Here, we provide the first case study describing non-opsin light-sensitive GPCRs based on tissue-specific RNA-seq data of the common bay scallop Argopecten irradians (Lamarck, 1819). Using a combination of sequence analysis and three-dimensional protein modeling, we identified two candidate proteins. We tested their photochemical properties and provide evidence showing that these two proteins incorporate 11-cis and/or all-trans retinal and react to light photochemically. Based on this case study, we demonstrate that there is potential for the discovery of new light-sensitive GPCRs, and we have developed a workflow that starts from RNA-seq assemblies to the discovery of new non-opsin, GPCR-based photopigments.

  19. Unified Software Solution for Efficient SPR Data Analysis in Drug Research

    PubMed Central

    Dahl, Göran; Steigele, Stephan; Hillertz, Per; Tigerström, Anna; Egnéus, Anders; Mehrle, Alexander; Ginkel, Martin; Edfeldt, Fredrik; Holdgate, Geoff; O’Connell, Nichole; Kappler, Bernd; Brodte, Annette; Rawlins, Philip B.; Davies, Gareth; Westberg, Eva-Lotta; Folmer, Rutger H. A.; Heyse, Stephan

    2016-01-01

    Surface plasmon resonance (SPR) is a powerful method for obtaining detailed molecular interaction parameters. Modern instrumentation with its increased throughput has enabled routine screening by SPR in hit-to-lead and lead optimization programs, and SPR has become a mainstream drug discovery technology. However, the processing and reporting of SPR data in drug discovery are typically performed manually, which is both time-consuming and tedious. Here, we present the workflow concept, design and experiences with a software module relying on a single, browser-based software platform for the processing, analysis, and reporting of SPR data. The efficiency of this concept lies in the immediate availability of end results: data are processed and analyzed upon loading the raw data file, allowing the user to immediately quality control the results. Once completed, the user can automatically report those results to data repositories for corporate access and quickly generate printed reports or documents. The software module has resulted in a very efficient and effective workflow through saved time and improved quality control. We discuss these benefits and show how this process defines a new benchmark in the drug discovery industry for the handling, interpretation, visualization, and sharing of SPR data. PMID:27789754

  20. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support

    PubMed Central

    2012-01-01

    Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org. PMID:22559942

  1. The MG-RAST Metagenomics Database and Portal in 2015

    DOE PAGES

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; ...

    2015-12-09

    MG-RAST (http://metagenomics.anl.gov) is an opensubmission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. Currently, the system hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. Lastly, to show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignmentmore » tools.« less

  2. Internal validation of the GlobalFiler™ Express PCR Amplification Kit for the direct amplification of reference DNA samples on a high-throughput automated workflow.

    PubMed

    Flores, Shahida; Sun, Jie; King, Jonathan; Budowle, Bruce

    2014-05-01

    The GlobalFiler™ Express PCR Amplification Kit uses 6-dye fluorescent chemistry to enable multiplexing of 21 autosomal STRs, 1 Y-STR, 1 Y-indel and the sex-determining marker amelogenin. The kit is specifically designed for processing reference DNA samples in a high throughput manner. Validation studies were conducted to assess the performance and define the limitations of this direct amplification kit for typing blood and buccal reference DNA samples on various punchable collection media. Studies included thermal cycling sensitivity, reproducibility, precision, sensitivity of detection, minimum detection threshold, system contamination, stochastic threshold and concordance. Results showed that optimal amplification and injection parameters for a 1.2mm punch from blood and buccal samples were 27 and 28 cycles, respectively, combined with a 12s injection on an ABI 3500xL Genetic Analyzer. Minimum detection thresholds were set at 100 and 120RFUs for 27 and 28 cycles, respectively, and it was suggested that data from positive amplification controls provided a better threshold representation. Stochastic thresholds were set at 250 and 400RFUs for 27 and 28 cycles, respectively, as stochastic effects increased with cycle number. The minimum amount of input DNA resulting in a full profile was 0.5ng, however, the optimum range determined was 2.5-10ng. Profile quality from the GlobalFiler™ Express Kit and the previously validated AmpFlSTR(®) Identifiler(®) Direct Kit was comparable. The validation data support that reliable DNA typing results from reference DNA samples can be obtained using the GlobalFiler™ Express PCR Amplification Kit. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Virtual High-Throughput Screening To Identify Novel Activin Antagonists

    PubMed Central

    Zhu, Jie; Mishra, Rama K.; Schiltz, Gary E.; Makanji, Yogeshwar; Scheidt, Karl A.; Mazar, Andrew P.; Woodruff, Teresa K.

    2015-01-01

    Activin belongs to the TGFβ superfamily, which is associated with several disease conditions, including cancer-related cachexia, preterm labor with delivery, and osteoporosis. Targeting activin and its related signaling pathways holds promise as a therapeutic approach to these diseases. A small-molecule ligand-binding groove was identified in the interface between the two activin βA subunits and was used for a virtual high-throughput in silico screening of the ZINC database to identify hits. Thirty-nine compounds without significant toxicity were tested in two well-established activin assays: FSHβ transcription and HepG2 cell apoptosis. This screening workflow resulted in two lead compounds: NUCC-474 and NUCC-555. These potential activin antagonists were then shown to inhibit activin A-mediated cell proliferation in ex vivo ovary cultures. In vivo testing showed that our most potent compound (NUCC-555) caused a dose-dependent decrease in FSH levels in ovariectomized mice. The Blitz competition binding assay confirmed target binding of NUCC-555 to the activin A:ActRII that disrupts the activin A:ActRII complex’s binding with ALK4-ECD-Fc in a dose-dependent manner. The NUCC-555 also specifically binds to activin A compared with other TGFβ superfamily member myostatin (GDF8). These data demonstrate a new in silico-based strategy for identifying small-molecule activin antagonists. Our approach is the first to identify a first-in-class small-molecule antagonist of activin binding to ALK4, which opens a completely new approach to inhibiting the activity of TGFβ receptor superfamily members. in addition, the lead compound can serve as a starting point for lead optimization toward the goal of a compound that may be effective in activin-mediated diseases. PMID:26098096

  4. Application of high-throughput sequencing to whole rabies viral genome characterisation and its use for phylogenetic re-evaluation of a raccoon strain incursion into the province of Ontario.

    PubMed

    Nadin-Davis, Susan A; Colville, Adam; Trewby, Hannah; Biek, Roman; Real, Leslie

    2017-03-15

    Raccoon rabies remains a serious public health problem throughout much of the eastern seaboard of North America due to the urban nature of the reservoir host and the many challenges inherent in multi-jurisdictional efforts to administer co-ordinated and comprehensive wildlife rabies control programmes. Better understanding of the mechanisms of spread of rabies virus can play a significant role in guiding such control efforts. To facilitate a detailed molecular epidemiological study of raccoon rabies virus movements across eastern North America, we developed a methodology to efficiently determine whole genome sequences of hundreds of viral samples. The workflow combines the generation of a limited number of overlapping amplicons covering the complete viral genome and use of high throughput sequencing technology. The value of this approach is demonstrated through a retrospective phylogenetic analysis of an outbreak of raccoon rabies which occurred in the province of Ontario between 1999 and 2005. As demonstrated by the number of single nucleotide polymorphisms detected, whole genome sequence data were far more effective than single gene sequences in discriminating between samples and this facilitated the generation of more robust and informative phylogenies that yielded insights into the spatio-temporal pattern of viral spread. With minor modification this approach could be applied to other rabies virus variants thereby facilitating greatly improved phylogenetic inference and thus better understanding of the spread of this serious zoonotic disease. Such information will inform the most appropriate strategies for rabies control in wildlife reservoirs. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  5. Automated High-Throughput Permethylation for Glycosylation Analysis of Biologics Using MALDI-TOF-MS.

    PubMed

    Shubhakar, Archana; Kozak, Radoslaw P; Reiding, Karli R; Royle, Louise; Spencer, Daniel I R; Fernandes, Daryl L; Wuhrer, Manfred

    2016-09-06

    Monitoring glycoprotein therapeutics for changes in glycosylation throughout the drug's life cycle is vital, as glycans significantly modulate the stability, biological activity, serum half-life, safety, and immunogenicity. Biopharma companies are increasingly adopting Quality by Design (QbD) frameworks for measuring, optimizing, and controlling drug glycosylation. Permethylation of glycans prior to analysis by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) is a valuable tool for glycan characterization and for screening of large numbers of samples in QbD drug realization. However, the existing protocols for manual permethylation and liquid-liquid extraction (LLE) steps are labor intensive and are thus not practical for high-throughput (HT) studies. Here we present a glycan permethylation protocol, based on 96-well microplates, that has been developed into a kit suitable for HT work. The workflow is largely automated using a liquid handling robot and includes N-glycan release, enrichment of N-glycans, permethylation, and LLE. The kit has been validated according to industry analytical performance guidelines and applied to characterize biopharmaceutical samples, including IgG4 monoclonal antibodies (mAbs) and recombinant human erythropoietin (rhEPO). The HT permethylation enabled glycan characterization and relative quantitation with minimal side reactions: the MALDI-TOF-MS profiles obtained were in good agreement with hydrophilic liquid interaction chromatography (HILIC) and ultrahigh performance liquid chromatography (UHPLC) data. Automated permethylation and extraction of 96 glycan samples was achieved in less than 5 h and automated data acquisition on MALDI-TOF-MS took on average less than 1 min per sample. This automated and HT glycan preparation and permethylation showed to be convenient, fast, and reliable and can be applied for drug glycan profiling and clinical glycan biomarker studies.

  6. A characterization of workflow management systems for extreme-scale applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  7. A characterization of workflow management systems for extreme-scale applications

    DOE PAGES

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

    2017-02-16

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  8. Phaedra, a protocol-driven system for analysis and validation of high-content imaging and flow cytometry.

    PubMed

    Cornelissen, Frans; Cik, Miroslav; Gustin, Emmanuel

    2012-04-01

    High-content screening has brought new dimensions to cellular assays by generating rich data sets that characterize cell populations in great detail and detect subtle phenotypes. To derive relevant, reliable conclusions from these complex data, it is crucial to have informatics tools supporting quality control, data reduction, and data mining. These tools must reconcile the complexity of advanced analysis methods with the user-friendliness demanded by the user community. After review of existing applications, we realized the possibility of adding innovative new analysis options. Phaedra was developed to support workflows for drug screening and target discovery, interact with several laboratory information management systems, and process data generated by a range of techniques including high-content imaging, multicolor flow cytometry, and traditional high-throughput screening assays. The application is modular and flexible, with an interface that can be tuned to specific user roles. It offers user-friendly data visualization and reduction tools for HCS but also integrates Matlab for custom image analysis and the Konstanz Information Miner (KNIME) framework for data mining. Phaedra features efficient JPEG2000 compression and full drill-down functionality from dose-response curves down to individual cells, with exclusion and annotation options, cell classification, statistical quality controls, and reporting.

  9. ibex: An open infrastructure software platform to facilitate collaborative work in radiomics

    PubMed Central

    Zhang, Lifei; Fried, David V.; Fave, Xenia J.; Hunter, Luke A.; Court, Laurence E.

    2015-01-01

    Purpose: Radiomics, which is the high-throughput extraction and analysis of quantitative image features, has been shown to have considerable potential to quantify the tumor phenotype. However, at present, a lack of software infrastructure has impeded the development of radiomics and its applications. Therefore, the authors developed the imaging biomarker explorer (ibex), an open infrastructure software platform that flexibly supports common radiomics workflow tasks such as multimodality image data import and review, development of feature extraction algorithms, model validation, and consistent data sharing among multiple institutions. Methods: The ibex software package was developed using the matlab and c/c++ programming languages. The software architecture deploys the modern model-view-controller, unit testing, and function handle programming concepts to isolate each quantitative imaging analysis task, to validate if their relevant data and algorithms are fit for use, and to plug in new modules. On one hand, ibex is self-contained and ready to use: it has implemented common data importers, common image filters, and common feature extraction algorithms. On the other hand, ibex provides an integrated development environment on top of matlab and c/c++, so users are not limited to its built-in functions. In the ibex developer studio, users can plug in, debug, and test new algorithms, extending ibex’s functionality. ibex also supports quality assurance for data and feature algorithms: image data, regions of interest, and feature algorithm-related data can be reviewed, validated, and/or modified. More importantly, two key elements in collaborative workflows, the consistency of data sharing and the reproducibility of calculation result, are embedded in the ibex workflow: image data, feature algorithms, and model validation including newly developed ones from different users can be easily and consistently shared so that results can be more easily reproduced between institutions. Results: Researchers with a variety of technical skill levels, including radiation oncologists, physicists, and computer scientists, have found the ibex software to be intuitive, powerful, and easy to use. ibex can be run at any computer with the windows operating system and 1GB RAM. The authors fully validated the implementation of all importers, preprocessing algorithms, and feature extraction algorithms. Windows version 1.0 beta of stand-alone ibex and ibex’s source code can be downloaded. Conclusions: The authors successfully implemented ibex, an open infrastructure software platform that streamlines common radiomics workflow tasks. Its transparency, flexibility, and portability can greatly accelerate the pace of radiomics research and pave the way toward successful clinical translation. PMID:25735289

  10. IBEX: an open infrastructure software platform to facilitate collaborative work in radiomics.

    PubMed

    Zhang, Lifei; Fried, David V; Fave, Xenia J; Hunter, Luke A; Yang, Jinzhong; Court, Laurence E

    2015-03-01

    Radiomics, which is the high-throughput extraction and analysis of quantitative image features, has been shown to have considerable potential to quantify the tumor phenotype. However, at present, a lack of software infrastructure has impeded the development of radiomics and its applications. Therefore, the authors developed the imaging biomarker explorer (IBEX), an open infrastructure software platform that flexibly supports common radiomics workflow tasks such as multimodality image data import and review, development of feature extraction algorithms, model validation, and consistent data sharing among multiple institutions. The IBEX software package was developed using the MATLAB and c/c++ programming languages. The software architecture deploys the modern model-view-controller, unit testing, and function handle programming concepts to isolate each quantitative imaging analysis task, to validate if their relevant data and algorithms are fit for use, and to plug in new modules. On one hand, IBEX is self-contained and ready to use: it has implemented common data importers, common image filters, and common feature extraction algorithms. On the other hand, IBEX provides an integrated development environment on top of MATLAB and c/c++, so users are not limited to its built-in functions. In the IBEX developer studio, users can plug in, debug, and test new algorithms, extending IBEX's functionality. IBEX also supports quality assurance for data and feature algorithms: image data, regions of interest, and feature algorithm-related data can be reviewed, validated, and/or modified. More importantly, two key elements in collaborative workflows, the consistency of data sharing and the reproducibility of calculation result, are embedded in the IBEX workflow: image data, feature algorithms, and model validation including newly developed ones from different users can be easily and consistently shared so that results can be more easily reproduced between institutions. Researchers with a variety of technical skill levels, including radiation oncologists, physicists, and computer scientists, have found the IBEX software to be intuitive, powerful, and easy to use. IBEX can be run at any computer with the windows operating system and 1GB RAM. The authors fully validated the implementation of all importers, preprocessing algorithms, and feature extraction algorithms. Windows version 1.0 beta of stand-alone IBEX and IBEX's source code can be downloaded. The authors successfully implemented IBEX, an open infrastructure software platform that streamlines common radiomics workflow tasks. Its transparency, flexibility, and portability can greatly accelerate the pace of radiomics research and pave the way toward successful clinical translation.

  11. Standardizing clinical trials workflow representation in UML for international site comparison.

    PubMed

    de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M O; Rodrigues, Maria J; Shah, Jatin; Loures, Marco R; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

    2010-11-09

    With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows.

  12. Standardizing Clinical Trials Workflow Representation in UML for International Site Comparison

    PubMed Central

    de Carvalho, Elias Cesar Araujo; Jayanti, Madhav Kishore; Batilana, Adelia Portero; Kozan, Andreia M. O.; Rodrigues, Maria J.; Shah, Jatin; Loures, Marco R.; Patil, Sunita; Payne, Philip; Pietrobon, Ricardo

    2010-01-01

    Background With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. Methods Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. Results Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. Conclusions This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows. PMID:21085484

  13. Barriers to critical thinking: workflow interruptions and task switching among nurses.

    PubMed

    Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin

    2011-10-01

    Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.

  14. Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Chase Qishi; Zhu, Michelle Mengxia

    The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less

  15. Optimizing MRI Logistics: Prospective Analysis of Performance, Efficiency, and Patient Throughput.

    PubMed

    Beker, Kevin; Garces-Descovich, Alejandro; Mangosing, Jason; Cabral-Goncalves, Ines; Hallett, Donna; Mortele, Koenraad J

    2017-10-01

    The objective of this study is to optimize MRI logistics through evaluation of MRI workflow and analysis of performance, efficiency, and patient throughput in a tertiary care academic center. For 2 weeks, workflow data from two outpatient MRI scanners were prospectively collected and stratified by value added to the process (i.e., value-added time, business value-added time, or non-value-added time). Two separate time cycles were measured: the actual MRI process cycle as well as the complete length of patient stay in the department. In addition, the impact and frequency of delays across all observations were measured. A total of 305 MRI examinations were evaluated, including body (34.1%), neurologic (28.9%), musculoskeletal (21.0%), and breast examinations (16.1%). The MRI process cycle lasted a mean of 50.97 ± 24.4 (SD) minutes per examination; the mean non-value-added time was 13.21 ± 18.77 minutes (25.87% of the total process cycle time). The mean length-of-stay cycle was 83.51 ± 33.63 minutes; the mean non-value-added time was 24.33 ± 24.84 minutes (29.14% of the total patient stay). The delay with the highest frequency (5.57%) was IV or port placement, which had a mean delay of 22.82 minutes. The delay with the greatest impact on time was MRI arthrography for which joint injection of contrast medium was necessary but was not accounted for in the schedule (mean delay, 42.2 minutes; frequency, 1.64%). Of 305 patients, 34 (11.15%) did not arrive at or before their scheduled time. Non-value-added time represents approximately one-third of the total MRI process cycle and patient length of stay. Identifying specific delays may expedite the application of targeted improvement strategies, potentially increasing revenue, efficiency, and overall patient satisfaction.

  16. Developments in SPR Fragment Screening.

    PubMed

    Chavanieu, Alain; Pugnière, Martine

    2016-01-01

    Fragment-based approaches have played an increasing role alongside high-throughput screening in drug discovery for 15 years. The label-free biosensor technology based on surface plasmon resonance (SPR) is now sensitive and informative enough to serve during primary screens and validation steps. In this review, the authors discuss the role of SPR in fragment screening. After a brief description of the underlying principles of the technique and main device developments, they evaluate the advantages and adaptations of SPR for fragment-based drug discovery. SPR can also be applied to challenging targets such as membrane receptors and enzymes. The high-level of immobilization of the protein target and its stability are key points for a relevant screening that can be optimized using oriented immobilized proteins and regenerable sensors. Furthermore, to decrease the rate of false negatives, a selectivity test may be performed in parallel on the main target bearing the binding site mutated or blocked with a low-off-rate ligand. Fragment-based drug design, integrated in a rational workflow led by SPR, will thus have a predominant role for the next wave of drug discovery which could be greatly enhanced by new improvements in SPR devices.

  17. An Integrated Chemical Environment to Support 21st-Century Toxicology.

    PubMed

    Bell, Shannon M; Phillips, Jason; Sedykh, Alexander; Tandon, Arpit; Sprankle, Catherine; Morefield, Stephen Q; Shapiro, Andy; Allen, David; Shah, Ruchir; Maull, Elizabeth A; Casey, Warren M; Kleinstreuer, Nicole C

    2017-05-25

    SUMMARY : Access to high-quality reference data is essential for the development, validation, and implementation of in vitro and in silico approaches that reduce and replace the use of animals in toxicity testing. Currently, these data must often be pooled from a variety of disparate sources to efficiently link a set of assay responses and model predictions to an outcome or hazard classification. To provide a central access point for these purposes, the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods developed the Integrated Chemical Environment (ICE) web resource. The ICE data integrator allows users to retrieve and combine data sets and to develop hypotheses through data exploration. Open-source computational workflows and models will be available for download and application to local data. ICE currently includes curated in vivo test data, reference chemical information, in vitro assay data (including Tox21 TM /ToxCast™ high-throughput screening data), and in silico model predictions. Users can query these data collections focusing on end points of interest such as acute systemic toxicity, endocrine disruption, skin sensitization, and many others. ICE is publicly accessible at https://ice.ntp.niehs.nih.gov. https://doi.org/10.1289/EHP1759.

  18. An Integrated Chemical Environment to Support 21st-Century Toxicology

    PubMed Central

    Bell, Shannon M.; Phillips, Jason; Sedykh, Alexander; Tandon, Arpit; Sprankle, Catherine; Morefield, Stephen Q.; Shapiro, Andy; Allen, David; Shah, Ruchir; Maull, Elizabeth A.; Casey, Warren M.

    2017-01-01

    Summary: Access to high-quality reference data is essential for the development, validation, and implementation of in vitro and in silico approaches that reduce and replace the use of animals in toxicity testing. Currently, these data must often be pooled from a variety of disparate sources to efficiently link a set of assay responses and model predictions to an outcome or hazard classification. To provide a central access point for these purposes, the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods developed the Integrated Chemical Environment (ICE) web resource. The ICE data integrator allows users to retrieve and combine data sets and to develop hypotheses through data exploration. Open-source computational workflows and models will be available for download and application to local data. ICE currently includes curated in vivo test data, reference chemical information, in vitro assay data (including Tox21TM/ToxCast™ high-throughput screening data), and in silico model predictions. Users can query these data collections focusing on end points of interest such as acute systemic toxicity, endocrine disruption, skin sensitization, and many others. ICE is publicly accessible at https://ice.ntp.niehs.nih.gov. https://doi.org/10.1289/EHP1759 PMID:28557712

  19. Here and now: the intersection of computational science, quantum-mechanical simulations, and materials science

    NASA Astrophysics Data System (ADS)

    Marzari, Nicola

    The last 30 years have seen the steady and exhilarating development of powerful quantum-simulation engines for extended systems, dedicated to the solution of the Kohn-Sham equations of density-functional theory, often augmented by density-functional perturbation theory, many-body perturbation theory, time-dependent density-functional theory, dynamical mean-field theory, and quantum Monte Carlo. Their implementation on massively parallel architectures, now leveraging also GPUs and accelerators, has started a massive effort in the prediction from first principles of many or of complex materials properties, leading the way to the exascale through the combination of HPC (high-performance computing) and HTC (high-throughput computing). Challenges and opportunities abound: complementing hardware and software investments and design; developing the materials' informatics infrastructure needed to encode knowledge into complex protocols and workflows of calculations; managing and curating data; resisting the complacency that we have already reached the predictive accuracy needed for materials design, or a robust level of verification of the different quantum engines. In this talk I will provide an overview of these challenges, with the ultimate prize being the computational understanding, prediction, and design of properties and performance for novel or complex materials and devices.

  20. sRNAtoolboxVM: Small RNA Analysis in a Virtual Machine.

    PubMed

    Gómez-Martín, Cristina; Lebrón, Ricardo; Rueda, Antonio; Oliver, José L; Hackenberg, Michael

    2017-01-01

    High-throughput sequencing (HTS) data for small RNAs (noncoding RNA molecules that are 20-250 nucleotides in length) can now be routinely generated by minimally equipped wet laboratories; however, the bottleneck in HTS-based research has shifted now to the analysis of such huge amount of data. One of the reasons is that many analysis types require a Linux environment but computers, system administrators, and bioinformaticians suppose additional costs that often cannot be afforded by small to mid-sized groups or laboratories. Web servers are an alternative that can be used if the data is not subjected to privacy issues (what very often is an important issue with medical data). However, in any case they are less flexible than stand-alone programs limiting the number of workflows and analysis types that can be carried out.We show in this protocol how virtual machines can be used to overcome those problems and limitations. sRNAtoolboxVM is a virtual machine that can be executed on all common operating systems through virtualization programs like VirtualBox or VMware, providing the user with a high number of preinstalled programs like sRNAbench for small RNA analysis without the need to maintain additional servers and/or operating systems.

  1. Text mining meets workflow: linking U-Compare with Taverna

    PubMed Central

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  2. The Evolution of Chemical High-Throughput Experimentation To Address Challenging Problems in Pharmaceutical Synthesis.

    PubMed

    Krska, Shane W; DiRocco, Daniel A; Dreher, Spencer D; Shevlin, Michael

    2017-12-19

    The structural complexity of pharmaceuticals presents a significant challenge to modern catalysis. Many published methods that work well on simple substrates often fail when attempts are made to apply them to complex drug intermediates. The use of high-throughput experimentation (HTE) techniques offers a means to overcome this fundamental challenge by facilitating the rational exploration of large arrays of catalysts and reaction conditions in a time- and material-efficient manner. Initial forays into the use of HTE in our laboratories for solving chemistry problems centered around screening of chiral precious-metal catalysts for homogeneous asymmetric hydrogenation. The success of these early efforts in developing efficient catalytic steps for late-stage development programs motivated the desire to increase the scope of this approach to encompass other high-value catalytic chemistries. Doing so, however, required significant advances in reactor and workflow design and automation to enable the effective assembly and agitation of arrays of heterogeneous reaction mixtures and retention of volatile solvents under a wide range of temperatures. Associated innovations in high-throughput analytical chemistry techniques greatly increased the efficiency and reliability of these methods. These evolved HTE techniques have been utilized extensively to develop highly innovative catalysis solutions to the most challenging problems in large-scale pharmaceutical synthesis. Starting with Pd- and Cu-catalyzed cross-coupling chemistry, subsequent efforts expanded to other valuable modern synthetic transformations such as chiral phase-transfer catalysis, photoredox catalysis, and C-H functionalization. As our experience and confidence in HTE techniques matured, we envisioned their application beyond problems in process chemistry to address the needs of medicinal chemists. Here the problem of reaction generality is felt most acutely, and HTE approaches should prove broadly enabling. However, the quantities of both time and starting materials available for chemistry troubleshooting in this space generally are severely limited. Adapting to these needs led us to invest in smaller predefined arrays of transformation-specific screening "kits" and push the boundaries of miniaturization in chemistry screening, culminating in the development of "nanoscale" reaction screening carried out in 1536-well plates. Grappling with the problem of generality also inspired the exploration of cheminformatics-driven HTE approaches such as the Chemistry Informer Libraries. These next-generation HTE methods promise to empower chemists to run orders of magnitude more experiments and enable "big data" informatics approaches to reaction design and troubleshooting. With these advances, HTE is poised to revolutionize how chemists across both industry and academia discover new synthetic methods, develop them into tools of broad utility, and apply them to problems of practical significance.

  3. Multiplexed transcriptome analysis to detect ALK, ROS1 and RET rearrangements in lung cancer

    PubMed Central

    Rogers, Toni-Maree; Arnau, Gisela Mir; Ryland, Georgina L.; Huang, Stephen; Lira, Maruja E.; Emmanuel, Yvette; Perez, Omar D.; Irwin, Darryl; Fellowes, Andrew P.; Wong, Stephen Q.; Fox, Stephen B.

    2017-01-01

    ALK, ROS1 and RET gene fusions are important predictive biomarkers for tyrosine kinase inhibitors in lung cancer. Currently, the gold standard method for gene fusion detection is Fluorescence In Situ Hybridization (FISH) and while highly sensitive and specific, it is also labour intensive, subjective in analysis, and unable to screen a large numbers of gene fusions. Recent developments in high-throughput transcriptome-based methods may provide a suitable alternative to FISH as they are compatible with multiplexing and diagnostic workflows. However, the concordance between these different methods compared with FISH has not been evaluated. In this study we compared the results from three transcriptome-based platforms (Nanostring Elements, Agena LungFusion panel and ThermoFisher NGS fusion panel) to those obtained from ALK, ROS1 and RET FISH on 51 clinical specimens. Overall agreement of results ranged from 86–96% depending on the platform used. While all platforms were highly sensitive, both the Agena panel and Thermo Fisher NGS fusion panel reported minor fusions that were not detectable by FISH. Our proof–of–principle study illustrates that transcriptome-based analyses are sensitive and robust methods for detecting actionable gene fusions in lung cancer and could provide a robust alternative to FISH testing in the diagnostic setting. PMID:28181564

  4. Hyperspectral imaging applied to microbial categorization in an automated microbiology workflow

    NASA Astrophysics Data System (ADS)

    Leroux, Denis F.; Midahuen, Rony; Perrin, Guillaume; Pescatore, Jeremie; Imbaud, Pierre

    2015-07-01

    Hyperspectral imaging (HSI) is being evaluated as a pre-selection tool to categorize and localize populations of microbial colonies directly onto their culture medium, in order to facilitate the microbiology workflow downstream the incubation step. The categorization criteria were here limited to the diffuse radiance spectra acquired mostly in the visible region between 400 and 900 nm. Although the diffuse radiance signal is much broader than the one acquired using vibrational techniques such as Raman and IR and limited to chromophores absorbing in the visible region, it can be acquired very quickly allowing to perform hyperspectral imaging of large objects (i.e. Petri dishes) with throughputs that are compatible with the needs of a clinical laboratory workflow. Moreover, additional cost reduction could possibly be achieved using application-specific multispectral systems. Furthermore, recent research has shown that good power of discrimination, at the species level, could be achieved at least for a low level of species. In our work, we test different culture media, with and without a strong light absorption in the visible region, and report categorization results obtained when selecting end-member spectra according to a multi-parametric study (colonies, agar type). Results of categorization (e.g. at the species level) are presented using two types of supervised-categorization algorithms providing that they deliver subpixel fractional abundance information (Linear Spectral Unmixing type) or not such as Spectral Angle Mapping (SAM) and Euclidian Distance (ED) type. Interestingly the performance between the two classes of algorithms is dramatically different, a trend which is not always observed. An interpretation is proposed on the basis of the agar interference and the spectral purity of end-member spectra.

  5. Monitoring data transfer latency in CMS computing operations

    DOE PAGES

    Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo; ...

    2015-12-23

    During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less

  6. Monitoring data transfer latency in CMS computing operations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonacorsi, Daniele; Diotalevi, Tommaso; Magini, Nicolo

    During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, andmore » to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.« less

  7. The Electrolyte Genome project: A big data approach in battery materials discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qu, Xiaohui; Jain, Anubhav; Rajput, Nav Nidhi

    2015-06-01

    We present a high-throughput infrastructure for the automated calculation of molecular properties with a focus on battery electrolytes. The infrastructure is largely open-source and handles both practical aspects (input file generation, output file parsing, and information management) as well as more complex problems (structure matching, salt complex generation, and failure recovery). Using this infrastructure, we have computed the ionization potential (IP) and electron affinities (EA) of 4830 molecules relevant to battery electrolytes (encompassing almost 55,000 quantum mechanics calculations) at the B3LYP/6-31+G(*) level. We describe automated workflows for computing redox potential, dissociation constant, and salt-molecule binding complex structure generation. We presentmore » routines for automatic recovery from calculation errors, which brings the failure rate from 9.2% to 0.8% for the QChem DFT code. Automated algorithms to check duplication between two arbitrary molecules and structures are described. We present benchmark data on basis sets and functionals on the G2-97 test set; one finding is that a IP/EA calculation method that combines PBE geometry optimization and B3LYP energy evaluation requires less computational cost and yields nearly identical results as compared to a full B3LYP calculation, and could be suitable for the calculation of large molecules. Our data indicates that among the 8 functionals tested, XYGJ-OS and B3LYP are the two best functionals to predict IP/EA with an RMSE of 0.12 and 0.27 eV, respectively. Application of our automated workflow to a large set of quinoxaline derivative molecules shows that functional group effect and substitution position effect can be separated for IP/EA of quinoxaline derivatives, and the most sensitive position is different for IP and EA. Published by Elsevier B.V« less

  8. Biomedical text mining and its applications in cancer research.

    PubMed

    Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

    2013-04-01

    Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Integration of EGA secure data access into Galaxy.

    PubMed

    Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; J A Fijneman, Remond; Boiten, Jan-Willem; A Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne

    2016-01-01

    High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study.  The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.

  10. Integration of EGA secure data access into Galaxy

    PubMed Central

    Hoogstrate, Youri; Zhang, Chao; Senf, Alexander; Bijlard, Jochem; Hiltemann, Saskia; van Enckevort, David; Repo, Susanna; Heringa, Jaap; Jenster, Guido; Fijneman, Remond J.A.; Boiten, Jan-Willem; A. Meijer, Gerrit; Stubbs, Andrew; Rambla, Jordi; Spalding, Dylan; Abeln, Sanne

    2016-01-01

    High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study.  The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer. PMID:28232859

  11. Mass Spectrometry of Human Leukocyte Antigen Class I Peptidomes Reveals Strong Effects of Protein Abundance and Turnover on Antigen Presentation*

    PubMed Central

    Bassani-Sternberg, Michal; Pletscher-Frankild, Sune; Jensen, Lars Juhl; Mann, Matthias

    2015-01-01

    HLA class I molecules reflect the health state of cells to cytotoxic T cells by presenting a repertoire of endogenously derived peptides. However, the extent to which the proteome shapes the peptidome is still largely unknown. Here we present a high-throughput mass-spectrometry-based workflow that allows stringent and accurate identification of thousands of such peptides and direct determination of binding motifs. Applying the workflow to seven cancer cell lines and primary cells, yielded more than 22,000 unique HLA peptides across different allelic binding specificities. By computing a score representing the HLA-I sampling density, we show a strong link between protein abundance and HLA-presentation (p < 0.0001). When analyzing overpresented proteins – those with at least fivefold higher density score than expected for their abundance – we noticed that they are degraded almost 3 h faster than similar but nonpresented proteins (top 20% abundance class; median half-life 20.8h versus 23.6h, p < 0.0001). This validates protein degradation as an important factor for HLA presentation. Ribosomal, mitochondrial respiratory chain, and nucleosomal proteins are particularly well presented. Taking a set of proteins associated with cancer, we compared the predicted immunogenicity of previously validated T-cell epitopes with other peptides from these proteins in our data set. The validated epitopes indeed tend to have higher immunogenic scores than the other detected HLA peptides. Remarkably, we identified five mutated peptides from a human colon cancer cell line, which have very recently been predicted to be HLA-I binders. Altogether, we demonstrate the usefulness of combining MS-analysis with immunogenesis prediction for identifying, ranking, and selecting peptides for therapeutic use. PMID:25576301

  12. Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

    PubMed Central

    Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

    2015-01-01

    Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966

  13. Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

    PubMed

    Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

    2015-01-01

    Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.

  14. A deep transcriptomic resource for the copepod crustacean Labidocera madurae: A potential indicator species for assessing near shore ecosystem health

    PubMed Central

    Christie, Andrew E.; Sommer, Stephanie A.; Cieslak, Matthew C.; Hartline, Daniel K.; Lenz, Petra H.

    2017-01-01

    Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne‘ohe Bay, Oahu, Hawai‘i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length “giant” proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species. PMID:29065152

  15. A deep transcriptomic resource for the copepod crustacean Labidocera madurae: A potential indicator species for assessing near shore ecosystem health.

    PubMed

    Roncalli, Vittoria; Christie, Andrew E; Sommer, Stephanie A; Cieslak, Matthew C; Hartline, Daniel K; Lenz, Petra H

    2017-01-01

    Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne'ohe Bay, Oahu, Hawai'i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length "giant" proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species.

  16. Multifunctional poly(methacrylate) polyplex libraries: A platform for gene delivery inspired by nature.

    PubMed

    Favretto, M E; Krieg, A; Schubert, S; Schubert, U S; Brock, R

    2015-07-10

    Polymer-based gene delivery systems have enormous potential in biomedicine, but their efficiency is often limited by poor biocompatibility. Poly(methacrylate)s (PMAs) are an interesting class of polymers which allow to explore structure-activity relationships of polymer functionalities for polyplex formation in oligonucleotide delivery. Here, we synthesized and tested a library of PMA polymers, containing functional groups contributing to the different steps of gene delivery, from oligonucleotide complexation to cellular internalization and endosomal escape. By variation of the molar ratios of the individual building blocks, the physicochemical properties of the polymers and polyplexes were fine-tuned to reduce toxicity as well as to increase activity of the polyplexes. To further enhance transfection efficiency, a cell-penetrating peptide (CPP)-like functionality was introduced on the polymeric backbone. With the ability to synthesize large libraries of polymers in parallel we also developed a workflow for a mid-to-high throughput screening, focusing first on safety parameters that are accessible by high-throughput approaches such as blood compatibility and toxicity towards host cells and only at a later stage on more laborious tests for the ability to deliver oligonucleotides. To arrive at a better understanding of the molecular basis of activity, furthermore, the effect of the presence of heparan sulfates on the surface of host cells was assessed and the mechanism of cell entry and intracellular trafficking investigated for those polymers that showed a suitable pharmacological profile. Following endocytic uptake, rapid endosomal release occurred. Interestingly, the presence of heparan sulfates on the cell surface had a negative impact on the activity of those polyplexes that were sensitive to decomplexation by heparin in solution. In summary, the screening approach identified two polymers, which form polyplexes with high stability and transfection capacity exceeding the one of poly(ethylene imine) also in the presence of serum. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. High-throughput neuroimaging-genetics computational infrastructure

    PubMed Central

    Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Hobel, Sam; Vespa, Paul; Woo Moon, Seok; Van Horn, John D.; Franco, Joseph; Toga, Arthur W.

    2014-01-01

    Many contemporary neuroscientific investigations face significant challenges in terms of data management, computational processing, data mining, and results interpretation. These four pillars define the core infrastructure necessary to plan, organize, orchestrate, validate, and disseminate novel scientific methods, computational resources, and translational healthcare findings. Data management includes protocols for data acquisition, archival, query, transfer, retrieval, and aggregation. Computational processing involves the necessary software, hardware, and networking infrastructure required to handle large amounts of heterogeneous neuroimaging, genetics, clinical, and phenotypic data and meta-data. Data mining refers to the process of automatically extracting data features, characteristics and associations, which are not readily visible by human exploration of the raw dataset. Result interpretation includes scientific visualization, community validation of findings and reproducible findings. In this manuscript we describe the novel high-throughput neuroimaging-genetics computational infrastructure available at the Institute for Neuroimaging and Informatics (INI) and the Laboratory of Neuro Imaging (LONI) at University of Southern California (USC). INI and LONI include ultra-high-field and standard-field MRI brain scanners along with an imaging-genetics database for storing the complete provenance of the raw and derived data and meta-data. In addition, the institute provides a large number of software tools for image and shape analysis, mathematical modeling, genomic sequence processing, and scientific visualization. A unique feature of this architecture is the Pipeline environment, which integrates the data management, processing, transfer, and visualization. Through its client-server architecture, the Pipeline environment provides a graphical user interface for designing, executing, monitoring validating, and disseminating of complex protocols that utilize diverse suites of software tools and web-services. These pipeline workflows are represented as portable XML objects which transfer the execution instructions and user specifications from the client user machine to remote pipeline servers for distributed computing. Using Alzheimer's and Parkinson's data, we provide several examples of translational applications using this infrastructure1. PMID:24795619

  18. Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.

    PubMed

    Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew

    2016-10-10

    Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  19. LipidMiner: A Software for Automated Identification and Quantification of Lipids from Multiple Liquid Chromatography-Mass Spectrometry Data Files

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Da; Zhang, Qibin; Gao, Xiaoli

    2014-04-30

    We have developed a tool for automated, high-throughput analysis of LC-MS/MS data files, which greatly simplifies LC-MS based lipidomics analysis. Our results showed that LipidMiner is accurate and comprehensive in identification and quantification of lipid molecular species. In addition, the workflow implemented in LipidMiner is not limited to identification and quantification of lipids. If a suitable metabolite library is implemented in the library matching module, LipidMiner could be reconfigured as a tool for general metabolomics data analysis. It is of note that LipidMiner currently is limited to singly charged ions, although it is adequate for the purpose of lipidomics sincemore » lipids are rarely multiply charged,[14] even for the polyphosphoinositides. LipidMiner also only processes file formats generated from mass spectrometers from Thermo, i.e. the .RAW format. In the future, we are planning to accommodate file formats generated by mass spectrometers from other predominant instrument vendors to make this tool more universal.« less

  20. Nanopublications for exposing experimental data in the life-sciences: a Huntington's Disease case study.

    PubMed

    Mina, Eleni; Thompson, Mark; Kaliyaperumal, Rajaram; Zhao, Jun; der Horst, van Eelke; Tatum, Zuotian; Hettne, Kristina M; Schultes, Erik A; Mons, Barend; Roos, Marco

    2015-01-01

    Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington's Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.

  1. Standardization in synthetic biology: an engineering discipline coming of age.

    PubMed

    Decoene, Thomas; De Paepe, Brecht; Maertens, Jo; Coussement, Pieter; Peters, Gert; De Maeseneire, Sofie L; De Mey, Marjan

    2018-08-01

    Leaping DNA read-and-write technologies, and extensive automation and miniaturization are radically transforming the field of biological experimentation by providing the tools that enable the cost-effective high-throughput required to address the enormous complexity of biological systems. However, standardization of the synthetic biology workflow has not kept abreast with dwindling technical and resource constraints, leading, for example, to the collection of multi-level and multi-omics large data sets that end up disconnected or remain under- or even unexploited. In this contribution, we critically evaluate the various efforts, and the (limited) success thereof, in order to introduce standards for defining, designing, assembling, characterizing, and sharing synthetic biology parts. The causes for this success or the lack thereof, as well as possible solutions to overcome these, are discussed. Akin to other engineering disciplines, extensive standardization will undoubtedly speed-up and reduce the cost of bioprocess development. In this respect, further implementation of synthetic biology standards will be crucial for the field in order to redeem its promise, i.e. to enable predictable forward engineering.

  2. Analytical workflow profiling gene expression in murine macrophages

    PubMed Central

    Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.

    2015-01-01

    Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305

  3. Acoustic Sample Deposition MALDI-MS (ASD-MALDI-MS): A Novel Process Flow for Quality Control Screening of Compound Libraries.

    PubMed

    Chin, Jefferson; Wood, Elizabeth; Peters, Grace S; Drexler, Dieter M

    2016-02-01

    In the early stages of drug discovery, high-throughput screening (HTS) of compound libraries against pharmaceutical targets is a common method to identify potential lead molecules. For these HTS campaigns to be efficient and successful, continuous quality control of the compound collection is necessary and crucial. However, the large number of compound samples and the limited sample amount pose unique challenges. Presented here is a proof-of-concept study for a novel process flow for the quality control screening of small-molecule compound libraries that consumes only minimal amounts of samples and affords compound-specific molecular data. This process employs an acoustic sample deposition (ASD) technique for the offline sample preparation by depositing nanoliter volumes in an array format onto microscope glass slides followed by matrix-assisted laser desorption/ionization mass spectrometric (MALDI-MS) analysis. An initial study of a 384-compound array employing the ASD-MALDI-MS workflow resulted in a 75% first-pass positive identification rate with an analysis time of <1 s per sample. © 2015 Society for Laboratory Automation and Screening.

  4. Experimental evaluation of a flexible I/O architecture for accelerating workflow engines in ultrascale environments

    DOE PAGES

    Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...

    2016-10-06

    The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less

  5. Workflow technology: the new frontier. How to overcome the barriers and join the future.

    PubMed

    Shefter, Susan M

    2006-01-01

    Hospitals are catching up to the business world in the introduction of technology systems that support professional practice and workflow. The field of case management is highly complex and interrelates with diverse groups in diverse locations. The last few years have seen the introduction of Workflow Technology Tools, which can improve the quality and efficiency of discharge planning by the case manager. Despite the availability of these wonderful new programs, many case managers are hesitant to adopt the new technology and workflow. For a myriad of reasons, a computer-based workflow system can seem like a brick wall. This article discusses, from a practitioner's point of view, how professionals can gain confidence and skill to get around the brick wall and join the future.

  6. Standardization and quality management in next-generation sequencing.

    PubMed

    Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus

    2016-09-01

    DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.

  7. The combined positive impact of Lean methodology and Ventana Symphony autostainer on histology lab workflow

    PubMed Central

    2010-01-01

    Background Histologic samples all funnel through the H&E microtomy staining area. Here manual processes intersect with semi-automated processes creating a bottleneck. We compare alternate work processes in anatomic pathology primarily in the H&E staining work cell. Methods We established a baseline measure of H&E process impact on personnel, information management and sample flow from historical workload and production data and direct observation. We compared this to performance after implementing initial Lean process modifications, including workstation reorganization, equipment relocation and workflow levelling, and the Ventana Symphony stainer to assess the impact on productivity in the H&E staining work cell. Results Average time from gross station to assembled case decreased by 2.9 hours (12%). Total process turnaround time (TAT) exclusive of processor schedule changes decreased 48 minutes/case (4%). Mean quarterly productivity increased 8.5% with the new methods. Process redesign reduced the number of manual steps from 219 to 182, a 17% reduction. Specimen travel distance was reduced from 773 ft/case to 395 ft/case (49%) overall, and from 92 to 53 ft/case in the H&E cell (42% improvement). Conclusions Implementation of Lean methods in the H&E work cell of histology can result in improved productivity, improved through-put and case availability parameters including TAT. PMID:20181123

  8. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  9. Optimization of tomographic reconstruction workflows on geographically distributed resources

    PubMed Central

    Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149

  10. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  11. Strategic and Operational Plan for Integrating Transcriptomics ...

    EPA Pesticide Factsheets

    Plans for incorporating high throughput transcriptomics into the current high throughput screening activities at NCCT; the details are in the attached slide presentation presentation on plans for incorporating high throughput transcriptomics into the current high throughput screening activities at NCCT, given at the OECD meeting on June 23, 2016

  12. High-Throughput Experimental Approach Capabilities | Materials Science |

    Science.gov Websites

    NREL High-Throughput Experimental Approach Capabilities High-Throughput Experimental Approach by yellow and is for materials in the upper right sector. NREL's high-throughput experimental ,Te) and oxysulfide sputtering Combi-5: Nitrides and oxynitride sputtering We also have several non

  13. Using Analytics to Support Petabyte-Scale Science on the NASA Earth Exchange (NEX)

    NASA Astrophysics Data System (ADS)

    Votava, P.; Michaelis, A.; Ganguly, S.; Nemani, R. R.

    2014-12-01

    NASA Earth Exchange (NEX) is a data, supercomputing and knowledge collaboratory that houses NASA satellite, climate and ancillary data where a focused community can come together to address large-scale challenges in Earth sciences. Analytics within NEX occurs at several levels - data, workflows, science and knowledge. At the data level, we are focusing on collecting and analyzing any information that is relevant to efficient acquisition, processing and management of data at the smallest granularity, such as files or collections. This includes processing and analyzing all local and many external metadata that are relevant to data quality, size, provenance, usage and other attributes. This then helps us better understand usage patterns and improve efficiency of data handling within NEX. When large-scale workflows are executed on NEX, we capture information that is relevant to processing and that can be analyzed in order to improve efficiencies in job scheduling, resource optimization, or data partitioning that would improve processing throughput. At this point we also collect data provenance as well as basic statistics of intermediate and final products created during the workflow execution. These statistics and metrics form basic process and data QA that, when combined with analytics algorithms, helps us identify issues early in the production process. We have already seen impact in some petabyte-scale projects, such as global Landsat processing, where we were able to reduce processing times from days to hours and enhance process monitoring and QA. While the focus so far has been mostly on support of NEX operations, we are also building a web-based infrastructure that enables users to perform direct analytics on science data - such as climate predictions or satellite data. Finally, as one of the main goals of NEX is knowledge acquisition and sharing, we began gathering and organizing information that associates users and projects with data, publications, locations and other attributes that can then be analyzed as a part of the NEX knowledge graph and used to greatly improve advanced search capabilities. Overall, we see data analytics at all levels as an important part of NEX as we are continuously seeking improvements in data management, workflow processing, use of resources, usability and science acceleration.

  14. MO-B-BRB-01: Optimize Treatment Planning Process in Clinical Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, W.

    The radiotherapy treatment planning process has evolved over the years with innovations in treatment planning, treatment delivery and imaging systems. Treatment modality and simulation technologies are also rapidly improving and affecting the planning process. For example, Image-guided-radiation-therapy has been widely adopted for patient setup, leading to margin reduction and isocenter repositioning after simulation. Stereotactic Body radiation therapy (SBRT) and Radiosurgery (SRS) have gradually become the standard of care for many treatment sites, which demand a higher throughput for the treatment plans even if the number of treatments per day remains the same. Finally, simulation, planning and treatment are traditionally sequentialmore » events. However, with emerging adaptive radiotherapy, they are becoming more tightly intertwined, leading to iterative processes. Enhanced efficiency of planning is therefore becoming more critical and poses serious challenge to the treatment planning process; Lean Six Sigma approaches are being utilized increasingly to balance the competing needs for speed and quality. In this symposium we will discuss the treatment planning process and illustrate effective techniques for managing workflow. Topics will include: Planning techniques: (a) beam placement, (b) dose optimization, (c) plan evaluation (d) export to RVS. Planning workflow: (a) import images, (b) Image fusion, (c) contouring, (d) plan approval (e) plan check (f) chart check, (g) sequential and iterative process Influence of upstream and downstream operations: (a) simulation, (b) immobilization, (c) motion management, (d) QA, (e) IGRT, (f) Treatment delivery, (g) SBRT/SRS (h) adaptive planning Reduction of delay between planning steps with Lean systems due to (a) communication, (b) limited resource, (b) contour, (c) plan approval, (d) treatment. Optimizing planning processes: (a) contour validation (b) consistent planning protocol, (c) protocol/template sharing, (d) semi-automatic plan evaluation, (e) quality checklist for error prevention, (f) iterative process, (g) balance of speed and quality Learning Objectives: Gain familiarity with the workflow of modern treatment planning process. Understand the scope and challenges of managing modern treatment planning processes. Gain familiarity with Lean Six Sigma approaches and their implementation in the treatment planning workflow.« less

  15. MO-B-BRB-00: Optimizing the Treatment Planning Process

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    The radiotherapy treatment planning process has evolved over the years with innovations in treatment planning, treatment delivery and imaging systems. Treatment modality and simulation technologies are also rapidly improving and affecting the planning process. For example, Image-guided-radiation-therapy has been widely adopted for patient setup, leading to margin reduction and isocenter repositioning after simulation. Stereotactic Body radiation therapy (SBRT) and Radiosurgery (SRS) have gradually become the standard of care for many treatment sites, which demand a higher throughput for the treatment plans even if the number of treatments per day remains the same. Finally, simulation, planning and treatment are traditionally sequentialmore » events. However, with emerging adaptive radiotherapy, they are becoming more tightly intertwined, leading to iterative processes. Enhanced efficiency of planning is therefore becoming more critical and poses serious challenge to the treatment planning process; Lean Six Sigma approaches are being utilized increasingly to balance the competing needs for speed and quality. In this symposium we will discuss the treatment planning process and illustrate effective techniques for managing workflow. Topics will include: Planning techniques: (a) beam placement, (b) dose optimization, (c) plan evaluation (d) export to RVS. Planning workflow: (a) import images, (b) Image fusion, (c) contouring, (d) plan approval (e) plan check (f) chart check, (g) sequential and iterative process Influence of upstream and downstream operations: (a) simulation, (b) immobilization, (c) motion management, (d) QA, (e) IGRT, (f) Treatment delivery, (g) SBRT/SRS (h) adaptive planning Reduction of delay between planning steps with Lean systems due to (a) communication, (b) limited resource, (b) contour, (c) plan approval, (d) treatment. Optimizing planning processes: (a) contour validation (b) consistent planning protocol, (c) protocol/template sharing, (d) semi-automatic plan evaluation, (e) quality checklist for error prevention, (f) iterative process, (g) balance of speed and quality Learning Objectives: Gain familiarity with the workflow of modern treatment planning process. Understand the scope and challenges of managing modern treatment planning processes. Gain familiarity with Lean Six Sigma approaches and their implementation in the treatment planning workflow.« less

  16. MO-B-BRB-03: Systems Engineering Tools for Treatment Planning Process Optimization in Radiation Medicine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kapur, A.

    The radiotherapy treatment planning process has evolved over the years with innovations in treatment planning, treatment delivery and imaging systems. Treatment modality and simulation technologies are also rapidly improving and affecting the planning process. For example, Image-guided-radiation-therapy has been widely adopted for patient setup, leading to margin reduction and isocenter repositioning after simulation. Stereotactic Body radiation therapy (SBRT) and Radiosurgery (SRS) have gradually become the standard of care for many treatment sites, which demand a higher throughput for the treatment plans even if the number of treatments per day remains the same. Finally, simulation, planning and treatment are traditionally sequentialmore » events. However, with emerging adaptive radiotherapy, they are becoming more tightly intertwined, leading to iterative processes. Enhanced efficiency of planning is therefore becoming more critical and poses serious challenge to the treatment planning process; Lean Six Sigma approaches are being utilized increasingly to balance the competing needs for speed and quality. In this symposium we will discuss the treatment planning process and illustrate effective techniques for managing workflow. Topics will include: Planning techniques: (a) beam placement, (b) dose optimization, (c) plan evaluation (d) export to RVS. Planning workflow: (a) import images, (b) Image fusion, (c) contouring, (d) plan approval (e) plan check (f) chart check, (g) sequential and iterative process Influence of upstream and downstream operations: (a) simulation, (b) immobilization, (c) motion management, (d) QA, (e) IGRT, (f) Treatment delivery, (g) SBRT/SRS (h) adaptive planning Reduction of delay between planning steps with Lean systems due to (a) communication, (b) limited resource, (b) contour, (c) plan approval, (d) treatment. Optimizing planning processes: (a) contour validation (b) consistent planning protocol, (c) protocol/template sharing, (d) semi-automatic plan evaluation, (e) quality checklist for error prevention, (f) iterative process, (g) balance of speed and quality Learning Objectives: Gain familiarity with the workflow of modern treatment planning process. Understand the scope and challenges of managing modern treatment planning processes. Gain familiarity with Lean Six Sigma approaches and their implementation in the treatment planning workflow.« less

  17. MO-B-BRB-02: Maintain the Quality of Treatment Planning for Time-Constraint Cases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, J.

    The radiotherapy treatment planning process has evolved over the years with innovations in treatment planning, treatment delivery and imaging systems. Treatment modality and simulation technologies are also rapidly improving and affecting the planning process. For example, Image-guided-radiation-therapy has been widely adopted for patient setup, leading to margin reduction and isocenter repositioning after simulation. Stereotactic Body radiation therapy (SBRT) and Radiosurgery (SRS) have gradually become the standard of care for many treatment sites, which demand a higher throughput for the treatment plans even if the number of treatments per day remains the same. Finally, simulation, planning and treatment are traditionally sequentialmore » events. However, with emerging adaptive radiotherapy, they are becoming more tightly intertwined, leading to iterative processes. Enhanced efficiency of planning is therefore becoming more critical and poses serious challenge to the treatment planning process; Lean Six Sigma approaches are being utilized increasingly to balance the competing needs for speed and quality. In this symposium we will discuss the treatment planning process and illustrate effective techniques for managing workflow. Topics will include: Planning techniques: (a) beam placement, (b) dose optimization, (c) plan evaluation (d) export to RVS. Planning workflow: (a) import images, (b) Image fusion, (c) contouring, (d) plan approval (e) plan check (f) chart check, (g) sequential and iterative process Influence of upstream and downstream operations: (a) simulation, (b) immobilization, (c) motion management, (d) QA, (e) IGRT, (f) Treatment delivery, (g) SBRT/SRS (h) adaptive planning Reduction of delay between planning steps with Lean systems due to (a) communication, (b) limited resource, (b) contour, (c) plan approval, (d) treatment. Optimizing planning processes: (a) contour validation (b) consistent planning protocol, (c) protocol/template sharing, (d) semi-automatic plan evaluation, (e) quality checklist for error prevention, (f) iterative process, (g) balance of speed and quality Learning Objectives: Gain familiarity with the workflow of modern treatment planning process. Understand the scope and challenges of managing modern treatment planning processes. Gain familiarity with Lean Six Sigma approaches and their implementation in the treatment planning workflow.« less

  18. wft4galaxy: a workflow testing tool for galaxy.

    PubMed

    Piras, Marco Enrico; Pireddu, Luca; Zanetti, Gianluigi

    2017-12-01

    Workflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature. With wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container-the latter reducing installation effort to a minimum. Available at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0. marcoenrico.piras@crs4.it. © The Author 2017. Published by Oxford University Press.

  19. AstroGrid: Taverna in the Virtual Observatory .

    NASA Astrophysics Data System (ADS)

    Benson, K. M.; Walton, N. A.

    This paper reports on the implementation of the Taverna workbench by AstroGrid, a tool for designing and executing workflows of tasks in the Virtual Observatory. The workflow approach helps astronomers perform complex task sequences with little technical effort. Visual approach to workflow construction streamlines highly complex analysis over public and private data and uses computational resources as minimal as a desktop computer. Some integration issues and future work are discussed in this article.

  20. Safety and feasibility of STAT RAD: Improvement of a novel rapid tomotherapy-based radiation therapy workflow by failure mode and effects analysis.

    PubMed

    Jones, Ryan T; Handsfield, Lydia; Read, Paul W; Wilson, David D; Van Ausdal, Ray; Schlesinger, David J; Siebers, Jeffrey V; Chen, Quan

    2015-01-01

    The clinical challenge of radiation therapy (RT) for painful bone metastases requires clinicians to consider both treatment efficacy and patient prognosis when selecting a radiation therapy regimen. The traditional RT workflow requires several weeks for common palliative RT schedules of 30 Gy in 10 fractions or 20 Gy in 5 fractions. At our institution, we have created a new RT workflow termed "STAT RAD" that allows clinicians to perform computed tomographic (CT) simulation, planning, and highly conformal single fraction treatment delivery within 2 hours. In this study, we evaluate the safety and feasibility of the STAT RAD workflow. A failure mode and effects analysis (FMEA) was performed on the STAT RAD workflow, including development of a process map, identification of potential failure modes, description of the cause and effect, temporal occurrence, and team member involvement in each failure mode, and examination of existing safety controls. A risk probability number (RPN) was calculated for each failure mode. As necessary, workflow adjustments were then made to safeguard failure modes of significant RPN values. After workflow alterations, RPN numbers were again recomputed. A total of 72 potential failure modes were identified in the pre-FMEA STAT RAD workflow, of which 22 met the RPN threshold for clinical significance. Workflow adjustments included the addition of a team member checklist, changing simulation from megavoltage CT to kilovoltage CT, alteration of patient-specific quality assurance testing, and allocating increased time for critical workflow steps. After these modifications, only 1 failure mode maintained RPN significance; patient motion after alignment or during treatment. Performing the FMEA for the STAT RAD workflow before clinical implementation has significantly strengthened the safety and feasibility of STAT RAD. The FMEA proved a valuable evaluation tool, identifying potential problem areas so that we could create a safer workflow. Copyright © 2015 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.

  1. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    PubMed

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  2. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

    PubMed Central

    Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  3. Superior Cross-Species Reference Genes: A Blueberry Case Study

    PubMed Central

    Die, Jose V.; Rowland, Lisa J.

    2013-01-01

    The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469

  4. BPELPower—A BPEL execution engine for geospatial web services

    NASA Astrophysics Data System (ADS)

    Yu, Genong (Eugene); Zhao, Peisheng; Di, Liping; Chen, Aijun; Deng, Meixia; Bai, Yuqi

    2012-10-01

    The Business Process Execution Language (BPEL) has become a popular choice for orchestrating and executing workflows in the Web environment. As one special kind of scientific workflow, geospatial Web processing workflows are data-intensive, deal with complex structures in data and geographic features, and execute automatically with limited human intervention. To enable the proper execution and coordination of geospatial workflows, a specially enhanced BPEL execution engine is required. BPELPower was designed, developed, and implemented as a generic BPEL execution engine with enhancements for executing geospatial workflows. The enhancements are especially in its capabilities in handling Geography Markup Language (GML) and standard geospatial Web services, such as the Web Processing Service (WPS) and the Web Feature Service (WFS). BPELPower has been used in several demonstrations over the decade. Two scenarios were discussed in detail to demonstrate the capabilities of BPELPower. That study showed a standard-compliant, Web-based approach for properly supporting geospatial processing, with the only enhancement at the implementation level. Pattern-based evaluation and performance improvement of the engine are discussed: BPELPower directly supports 22 workflow control patterns and 17 workflow data patterns. In the future, the engine will be enhanced with high performance parallel processing and broad Web paradigms.

  5. Aligning HST Images to Gaia: A Faster Mosaicking Workflow

    NASA Astrophysics Data System (ADS)

    Bajaj, V.

    2017-11-01

    We present a fully programmatic workflow for aligning HST images using the high-quality astrometry provided by Gaia Data Release 1. Code provided in a Jupyter Notebook works through this procedure, including parsing the data to determine the query area parameters, querying Gaia for the coordinate catalog, and using the catalog with TweakReg as reference catalog. This workflow greatly simplifies the normally time-consuming process of aligning HST images, especially those taken as part of mosaics.

  6. Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure

    PubMed Central

    2016-01-01

    Background Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. Objective The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. Methods We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. Results We identified 5 high-level macrocognitive processes affecting medication management—sensemaking, planning, coordination, monitoring, and decision making—and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Conclusions Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation. PMID:27733331

  7. Medication Management: The Macrocognitive Workflow of Older Adults With Heart Failure.

    PubMed

    Mickelson, Robin S; Unertl, Kim M; Holden, Richard J

    2016-10-12

    Older adults with chronic disease struggle to manage complex medication regimens. Health information technology has the potential to improve medication management, but only if it is based on a thorough understanding of the complexity of medication management workflow as it occurs in natural settings. Prior research reveals that patient work related to medication management is complex, cognitive, and collaborative. Macrocognitive processes are theorized as how people individually and collaboratively think in complex, adaptive, and messy nonlaboratory settings supported by artifacts. The objective of this research was to describe and analyze the work of medication management by older adults with heart failure, using a macrocognitive workflow framework. We interviewed and observed 61 older patients along with 30 informal caregivers about self-care practices including medication management. Descriptive qualitative content analysis methods were used to develop categories, subcategories, and themes about macrocognitive processes used in medication management workflow. We identified 5 high-level macrocognitive processes affecting medication management-sensemaking, planning, coordination, monitoring, and decision making-and 15 subprocesses. Data revealed workflow as occurring in a highly collaborative, fragile system of interacting people, artifacts, time, and space. Process breakdowns were common and patients had little support for macrocognitive workflow from current tools. Macrocognitive processes affected medication management performance. Describing and analyzing this performance produced recommendations for technology supporting collaboration and sensemaking, decision making and problem detection, and planning and implementation.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin

    The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less

  9. Taverna: a tool for building and running workflows of services

    PubMed Central

    Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

    2006-01-01

    Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108

  10. Streamlining workflow and automation to accelerate laboratory scale protein production.

    PubMed

    Konczal, Jennifer; Gray, Christopher H

    2017-05-01

    Protein production facilities are often required to produce diverse arrays of proteins for demanding methodologies including crystallography, NMR, ITC and other reagent intensive techniques. It is common for these teams to find themselves a bottleneck in the pipeline of ambitious projects. This pressure to deliver has resulted in the evolution of many novel methods to increase capacity and throughput at all stages in the pipeline for generation of recombinant proteins. This review aims to describe current and emerging options to accelerate the success of protein production in Escherichia coli. We emphasize technologies that have been evaluated and implemented in our laboratory, including innovative molecular biology and expression vectors, small-scale expression screening strategies and the automation of parallel and multidimensional chromatography. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  11. High-throughput high-volume nuclear imaging for preclinical in vivo compound screening§.

    PubMed

    Macholl, Sven; Finucane, Ciara M; Hesterman, Jacob; Mather, Stephen J; Pauplis, Rachel; Scully, Deirdre; Sosabowski, Jane K; Jouannot, Erwan

    2017-12-01

    Preclinical single-photon emission computed tomography (SPECT)/CT imaging studies are hampered by low throughput, hence are found typically within small volume feasibility studies. Here, imaging and image analysis procedures are presented that allow profiling of a large volume of radiolabelled compounds within a reasonably short total study time. Particular emphasis was put on quality control (QC) and on fast and unbiased image analysis. 2-3 His-tagged proteins were simultaneously radiolabelled by 99m Tc-tricarbonyl methodology and injected intravenously (20 nmol/kg; 100 MBq; n = 3) into patient-derived xenograft (PDX) mouse models. Whole-body SPECT/CT images of 3 mice simultaneously were acquired 1, 4, and 24 h post-injection, extended to 48 h and/or by 0-2 h dynamic SPECT for pre-selected compounds. Organ uptake was quantified by automated multi-atlas and manual segmentations. Data were plotted automatically, quality controlled and stored on a collaborative image management platform. Ex vivo uptake data were collected semi-automatically and analysis performed as for imaging data. >500 single animal SPECT images were acquired for 25 proteins over 5 weeks, eventually generating >3500 ROI and >1000 items of tissue data. SPECT/CT images clearly visualized uptake in tumour and other tissues even at 48 h post-injection. Intersubject uptake variability was typically 13% (coefficient of variation, COV). Imaging results correlated well with ex vivo data. The large data set of tumour, background and systemic uptake/clearance data from 75 mice for 25 compounds allows identification of compounds of interest. The number of animals required was reduced considerably by longitudinal imaging compared to dissection experiments. All experimental work and analyses were accomplished within 3 months expected to be compatible with drug development programmes. QC along all workflow steps, blinding of the imaging contract research organization to compound properties and automation provide confidence in the data set. Additional ex vivo data were useful as a control but could be omitted from future studies in the same centre. For even larger compound libraries, radiolabelling could be expedited and the number of imaging time points adapted to increase weekly throughput. Multi-atlas segmentation could be expanded via SPECT/MRI; however, this would require an MRI-compatible mouse hotel. Finally, analysis of nuclear images of radiopharmaceuticals in clinical trials may benefit from the automated analysis procedures developed.

  12. Biomining active cellulases from a mining bioremediation system.

    PubMed

    Mewis, Keith; Armstrong, Zachary; Song, Young C; Baldwin, Susan A; Withers, Stephen G; Hallam, Steven J

    2013-09-20

    Functional metagenomics has emerged as a powerful method for gene model validation and enzyme discovery from natural and human engineered ecosystems. Here we report development of a high-throughput functional metagenomic screen incorporating bioinformatic and biochemical analyses features. A fosmid library containing 6144 clones sourced from a mining bioremediation system was screened for cellulase activity using 2,4-dinitrophenyl β-cellobioside, a previously proven cellulose model substrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with the ability to hydrolyse 1,4-β-D-glucosidic linkages. Transposon mutagenesis identified genes belonging to glycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3, and 5 families were generated from sequences in the CAZy database for automated phylogenetic analysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes. Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids, expressed and purified to determine enzymatic properties including thermostability, pH optima, and substrate specificity. The workflow described here provides a general paradigm for recovery and characterization of microbially derived genes and gene products based on genetic logic and contemporary screening technologies developed for model organismal systems. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Mass Spectrometry Based Lipidomics: An Overview of Technological Platforms

    PubMed Central

    Köfeler, Harald C.; Fauland, Alexander; Rechberger, Gerald N.; Trötzmüller, Martin

    2012-01-01

    One decade after the genomic and the proteomic life science revolution, new ‘omics’ fields are emerging. The metabolome encompasses the entity of small molecules—Most often end products of a catalytic process regulated by genes and proteins—with the lipidome being its fat soluble subdivision. Within recent years, lipids are more and more regarded not only as energy storage compounds but also as interactive players in various cellular regulation cycles and thus attain rising interest in the bio-medical community. The field of lipidomics is, on one hand, fuelled by analytical technology advances, particularly mass spectrometry and chromatography, but on the other hand new biological questions also drive analytical technology developments. Compared to fairly standardized genomic or proteomic high-throughput protocols, the high degree of molecular heterogeneity adds a special analytical challenge to lipidomic analysis. In this review, we will take a closer look at various mass spectrometric platforms for lipidomic analysis. We will focus on the advantages and limitations of various experimental setups like ‘shotgun lipidomics’, liquid chromatography—Mass spectrometry (LC-MS) and matrix assisted laser desorption ionization-time of flight (MALDI-TOF) based approaches. We will also examine available software packages for data analysis, which nowadays is in fact the rate limiting step for most ‘omics’ workflows. PMID:24957366

  14. An integrative computational approach for prioritization of genomic variants

    DOE PAGES

    Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng; ...

    2014-12-15

    An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidatemore » genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. This study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.« less

  15. Mass spectrometry based lipidomics: an overview of technological platforms.

    PubMed

    Köfeler, Harald C; Fauland, Alexander; Rechberger, Gerald N; Trötzmüller, Martin

    2012-01-05

    One decade after the genomic and the proteomic life science revolution, new 'omics' fields are emerging. The metabolome encompasses the entity of small molecules-Most often end products of a catalytic process regulated by genes and proteins-with the lipidome being its fat soluble subdivision. Within recent years, lipids are more and more regarded not only as energy storage compounds but also as interactive players in various cellular regulation cycles and thus attain rising interest in the bio-medical community. The field of lipidomics is, on one hand, fuelled by analytical technology advances, particularly mass spectrometry and chromatography, but on the other hand new biological questions also drive analytical technology developments. Compared to fairly standardized genomic or proteomic high-throughput protocols, the high degree of molecular heterogeneity adds a special analytical challenge to lipidomic analysis. In this review, we will take a closer look at various mass spectrometric platforms for lipidomic analysis. We will focus on the advantages and limitations of various experimental setups like 'shotgun lipidomics', liquid chromatography-Mass spectrometry (LC-MS) and matrix assisted laser desorption ionization-time of flight (MALDI-TOF) based approaches. We will also examine available software packages for data analysis, which nowadays is in fact the rate limiting step for most 'omics' workflows.

  16. Operations and Data Processing for the Planck Low-Frequency Instrument: Design Strategies and Practical Experience

    NASA Astrophysics Data System (ADS)

    Pasian, F.; Zacchei, A.; Frailis, M.; Galeotta, S.; Maris, M.; Tavagnacco, D.; Vuerli, C.; Tuerler, M.; Rohlfs, R.; Morisset, N.; Meharga, M.; Ensslin, T. A.; Knoche, J.; Gregorio, A.; Maino, D.; Mennella, A.; Tomasi, M.; Cuttaia, F.; Morgante, G.; Terenzi, L.; Maggio, G.; Gasparo, F.; Franceschi, E.

    2012-09-01

    Planck is an ESA mission launched in May 2009, which is mapping the microwave sky in nine frequencies and accurately measuring the anisotropies of the Cosmic Microwave Background (CMB) with its complement of two instruments (HFI and LFI), covering respectively the far infrared and the radio domains. The operations and data processing of the Planck instruments are carried out by Data Processing Centers, one for each instrument. The DPCs need to support both a day-by-day quasi-real-time calibration workflow and high-throughput pipelines for a high-volume data flow. The LFI DPC has been designed to be a centralized facility built by geographically distributed institutions, in a funding scenario based on multiple funding agencies and, in most cases, on a fixed budget in the presence of launch delays. A strategy for managing effectively the distributed and collaborative software development and maintenance has been developed, based on the use of open source and off-the-shelf software, and on the reuse of systems developed ad-hoc for other missions. Product and quality assurance has been supported throughout development, integration and testing. The effectiveness of the design choices has been proven by the readiness of the system at launch time and by the extremely smooth operations phase.

  17. High Throughput PBTK: Open-Source Data and Tools for ...

    EPA Pesticide Factsheets

    Presentation on High Throughput PBTK at the PBK Modelling in Risk Assessment meeting in Ispra, Italy Presentation on High Throughput PBTK at the PBK Modelling in Risk Assessment meeting in Ispra, Italy

  18. Foundations of data-intensive science: Technology and practice for high throughput, widely distributed, data management and analysis systems

    NASA Astrophysics Data System (ADS)

    Johnston, William; Ernst, M.; Dart, E.; Tierney, B.

    2014-04-01

    Today's large-scale science projects involve world-wide collaborations depend on moving massive amounts of data from an instrument to potentially thousands of computing and storage systems at hundreds of collaborating institutions to accomplish their science. This is true for ATLAS and CMS at the LHC, and it is true for the climate sciences, Belle-II at the KEK collider, genome sciences, the SKA radio telescope, and ITER, the international fusion energy experiment. DOE's Office of Science has been collecting science discipline and instrument requirements for network based data management and analysis for more than a decade. As a result of this certain key issues are seen across essentially all science disciplines that rely on the network for significant data transfer, even if the data quantities are modest compared to projects like the LHC experiments. These issues are what this talk will address; to wit: 1. Optical signal transport advances enabling 100 Gb/s circuits that span the globe on optical fiber with each carrying 100 such channels; 2. Network router and switch requirements to support high-speed international data transfer; 3. Data transport (TCP is still the norm) requirements to support high-speed international data transfer (e.g. error-free transmission); 4. Network monitoring and testing techniques and infrastructure to maintain the required error-free operation of the many R&E networks involved in international collaborations; 5. Operating system evolution to support very high-speed network I/O; 6. New network architectures and services in the LAN (campus) and WAN networks to support data-intensive science; 7. Data movement and management techniques and software that can maximize the throughput on the network connections between distributed data handling systems, and; 8. New approaches to widely distributed workflow systems that can support the data movement and analysis required by the science. All of these areas must be addressed to enable large-scale, widely distributed data analysis systems, and the experience of the LHC can be applied to other scientific disciplines. In particular, specific analogies to the SKA will be cited in the talk.

  19. Hierarchical classification strategy for Phenotype extraction from epidermal growth factor receptor endocytosis screening.

    PubMed

    Cao, Lu; Graauw, Marjo de; Yan, Kuan; Winkel, Leah; Verbeek, Fons J

    2016-05-03

    Endocytosis is regarded as a mechanism of attenuating the epidermal growth factor receptor (EGFR) signaling and of receptor degradation. There is increasing evidence becoming available showing that breast cancer progression is associated with a defect in EGFR endocytosis. In order to find related Ribonucleic acid (RNA) regulators in this process, high-throughput imaging with fluorescent markers is used to visualize the complex EGFR endocytosis process. Subsequently a dedicated automatic image and data analysis system is developed and applied to extract the phenotype measurement and distinguish different developmental episodes from a huge amount of images acquired through high-throughput imaging. For the image analysis, a phenotype measurement quantifies the important image information into distinct features or measurements. Therefore, the manner in which prominent measurements are chosen to represent the dynamics of the EGFR process becomes a crucial step for the identification of the phenotype. In the subsequent data analysis, classification is used to categorize each observation by making use of all prominent measurements obtained from image analysis. Therefore, a better construction for a classification strategy will support to raise the performance level in our image and data analysis system. In this paper, we illustrate an integrated analysis method for EGFR signalling through image analysis of microscopy images. Sophisticated wavelet-based texture measurements are used to obtain a good description of the characteristic stages in the EGFR signalling. A hierarchical classification strategy is designed to improve the recognition of phenotypic episodes of EGFR during endocytosis. Different strategies for normalization, feature selection and classification are evaluated. The results of performance assessment clearly demonstrate that our hierarchical classification scheme combined with a selected set of features provides a notable improvement in the temporal analysis of EGFR endocytosis. Moreover, it is shown that the addition of the wavelet-based texture features contributes to this improvement. Our workflow can be applied to drug discovery to analyze defected EGFR endocytosis processes.

  20. Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

    PubMed Central

    Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

    2012-01-01

    High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309

  1. Use of Threshold of Toxicological Concern (TTC) with High ...

    EPA Pesticide Factsheets

    Although progress has been made with HTS (high throughput screening) in profiling biological activity (e.g., EPA’s ToxCast™), challenges arise interpreting HTS results in the context of adversity & converting HTS assay concentrations to equivalent human doses for the broad domain of commodity chemicals. Here, we propose using TTC as a risk screening method to evaluate exposure ranges derived from NHANES for 7968 chemicals. Because the well-established TTC approach uses hazard values derived from in vivo toxicity data, relevance to adverse effects is robust. We compared the conservative TTC (non-cancer) value of 90 μg/day (1.5 μg/kg/day) (Kroes et al., Fd Chem Toxicol, 2004) to quantitative exposure predictions of the upper 95% credible interval (UCI) of median daily exposures for 7968 chemicals in 10 different demographic groups (Wambaugh et al., Environ Sci Technol. 48:12760-7, 2014). Results indicate: (1) none of the median values of credible interval of exposure for any chemical in any demographic group was above the TTC; & (2) fewer than 5% of chemicals had an UCI that exceeded the TTC for any group. However, these median exposure predictions do not cover highly exposed (e.g., occupational) populations. Additionally, we propose an expanded risk-based screening workflow that comprises a TTC decision tree that includes screening compounds for structural alerts for DNA reactivity, OPs & carbamates as well as a comparison with bioactivity-based margins of

  2. Workflow Optimization for Tuning Prostheses with High Input Channel

    DTIC Science & Technology

    2017-10-01

    of Specific Aim 1 by driving a commercially available two DoF wrist and single DoF hand. The high -level control system will provide analog signals...AWARD NUMBER: W81XWH-16-1-0767 TITLE: Workflow Optimization for Tuning Prostheses with High Input Channel PRINCIPAL INVESTIGATOR: Daniel Merrill...Unlimited The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as an official Department

  3. Next-Generation Climate Modeling Science Challenges for Simulation, Workflow and Analysis Systems

    NASA Astrophysics Data System (ADS)

    Koch, D. M.; Anantharaj, V. G.; Bader, D. C.; Krishnan, H.; Leung, L. R.; Ringler, T.; Taylor, M.; Wehner, M. F.; Williams, D. N.

    2016-12-01

    We will present two examples of current and future high-resolution climate-modeling research that are challenging existing simulation run-time I/O, model-data movement, storage and publishing, and analysis. In each case, we will consider lessons learned as current workflow systems are broken by these large-data science challenges, as well as strategies to repair or rebuild the systems. First we consider the science and workflow challenges to be posed by the CMIP6 multi-model HighResMIP, involving around a dozen modeling groups performing quarter-degree simulations, in 3-member ensembles for 100 years, with high-frequency (1-6 hourly) diagnostics, which is expected to generate over 4PB of data. An example of science derived from these experiments will be to study how resolution affects the ability of models to capture extreme-events such as hurricanes or atmospheric rivers. Expected methods to transfer (using parallel Globus) and analyze (using parallel "TECA" software tools) HighResMIP data for such feature-tracking by the DOE CASCADE project will be presented. A second example will be from the Accelerated Climate Modeling for Energy (ACME) project, which is currently addressing challenges involving multiple century-scale coupled high resolution (quarter-degree) climate simulations on DOE Leadership Class computers. ACME is anticipating production of over 5PB of data during the next 2 years of simulations, in order to investigate the drivers of water cycle changes, sea-level-rise, and carbon cycle evolution. The ACME workflow, from simulation to data transfer, storage, analysis and publication will be presented. Current and planned methods to accelerate the workflow, including implementing run-time diagnostics, and implementing server-side analysis to avoid moving large datasets will be presented.

  4. High-Performance Computational Analysis of Glioblastoma Pathology Images with Database Support Identifies Molecular and Survival Correlates.

    PubMed

    Kong, Jun; Wang, Fusheng; Teodoro, George; Cooper, Lee; Moreno, Carlos S; Kurc, Tahsin; Pan, Tony; Saltz, Joel; Brat, Daniel

    2013-12-01

    In this paper, we present a novel framework for microscopic image analysis of nuclei, data management, and high performance computation to support translational research involving nuclear morphometry features, molecular data, and clinical outcomes. Our image analysis pipeline consists of nuclei segmentation and feature computation facilitated by high performance computing with coordinated execution in multi-core CPUs and Graphical Processor Units (GPUs). All data derived from image analysis are managed in a spatial relational database supporting highly efficient scientific queries. We applied our image analysis workflow to 159 glioblastomas (GBM) from The Cancer Genome Atlas dataset. With integrative studies, we found statistics of four specific nuclear features were significantly associated with patient survival. Additionally, we correlated nuclear features with molecular data and found interesting results that support pathologic domain knowledge. We found that Proneural subtype GBMs had the smallest mean of nuclear Eccentricity and the largest mean of nuclear Extent, and MinorAxisLength. We also found gene expressions of stem cell marker MYC and cell proliferation maker MKI67 were correlated with nuclear features. To complement and inform pathologists of relevant diagnostic features, we queried the most representative nuclear instances from each patient population based on genetic and transcriptional classes. Our results demonstrate that specific nuclear features carry prognostic significance and associations with transcriptional and genetic classes, highlighting the potential of high throughput pathology image analysis as a complementary approach to human-based review and translational research.

  5. Business intelligence for the radiologist: making your data work for you.

    PubMed

    Cook, Tessa S; Nagy, Paul

    2014-12-01

    Although it remains absent from most programs today, business intelligence (BI) has become an integral part of modern radiology practice management. BI facilitates the transition away from lack of understanding about a system and the data it produces toward incrementally more sophisticated comprehension of what has happened, could happen, and should happen. The individual components that make up BI are common across industries and include data extraction and transformation, process analysis and improvement, outcomes measures, performance assessment, graphical dashboarding, alerting, workflow analysis, and scenario modeling. As in other fields, these components can be directly applied in radiology to improve workflow, throughput, safety, efficacy, outcomes, and patient satisfaction. When approaching the subject of BI in radiology, it is important to know what data are available in your various electronic medical records, as well as where and how they are stored. In addition, it is critical to verify that the data actually represent what you think they do. Finally, it is critical for success to identify the features and limitations of the BI tools you choose to use and to plan your practice modifications on the basis of collected data. It is equally important to remember that BI plays a critical role in continuous process improvement; whichever BI tools you choose should be flexible to grow and evolve with your practice. Published by Elsevier Inc.

  6. Application of ToxCast High-Throughput Screening and ...

    EPA Pesticide Factsheets

    Slide presentation at the SETAC annual meeting on High-Throughput Screening and Modeling Approaches to Identify Steroidogenesis Distruptors Slide presentation at the SETAC annual meeting on High-Throughput Screening and Modeling Approaches to Identify Steroidogenssis Distruptors

  7. Integrated workflows for spiking neuronal network simulations

    PubMed Central

    Antolík, Ján; Davison, Andrew P.

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902

  8. Create, run, share, publish, and reference your LC-MS, FIA-MS, GC-MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics.

    PubMed

    Guitton, Yann; Tremblay-Franco, Marie; Le Corguillé, Gildas; Martin, Jean-François; Pétéra, Mélanie; Roger-Mele, Pierrick; Delabrière, Alexis; Goulitquer, Sophie; Monsoor, Misharl; Duperier, Christophe; Canlet, Cécile; Servien, Rémi; Tardivel, Patrick; Caron, Christophe; Giacomoni, Franck; Thévenot, Etienne A

    2017-12-01

    Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Integrated workflows for spiking neuronal network simulations.

    PubMed

    Antolík, Ján; Davison, Andrew P

    2013-01-01

    The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.

  10. An architecture model for multiple disease management information systems.

    PubMed

    Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei

    2013-04-01

    Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.

  11. The high throughput biomedicine unit at the institute for molecular medicine Finland: high throughput screening meets precision medicine.

    PubMed

    Pietiainen, Vilja; Saarela, Jani; von Schantz, Carina; Turunen, Laura; Ostling, Paivi; Wennerberg, Krister

    2014-05-01

    The High Throughput Biomedicine (HTB) unit at the Institute for Molecular Medicine Finland FIMM was established in 2010 to serve as a national and international academic screening unit providing access to state of the art instrumentation for chemical and RNAi-based high throughput screening. The initial focus of the unit was multiwell plate based chemical screening and high content microarray-based siRNA screening. However, over the first four years of operation, the unit has moved to a more flexible service platform where both chemical and siRNA screening is performed at different scales primarily in multiwell plate-based assays with a wide range of readout possibilities with a focus on ultraminiaturization to allow for affordable screening for the academic users. In addition to high throughput screening, the equipment of the unit is also used to support miniaturized, multiplexed and high throughput applications for other types of research such as genomics, sequencing and biobanking operations. Importantly, with the translational research goals at FIMM, an increasing part of the operations at the HTB unit is being focused on high throughput systems biological platforms for functional profiling of patient cells in personalized and precision medicine projects.

  12. High Throughput Screening For Hazard and Risk of Environmental Contaminants

    EPA Science Inventory

    High throughput toxicity testing provides detailed mechanistic information on the concentration response of environmental contaminants in numerous potential toxicity pathways. High throughput screening (HTS) has several key advantages: (1) expense orders of magnitude less than an...

  13. Requirements for Workflow-Based EHR Systems - Results of a Qualitative Study.

    PubMed

    Schweitzer, Marco; Lasierra, Nelia; Hoerbst, Alexander

    2016-01-01

    Today's high quality healthcare delivery strongly relies on efficient electronic health records (EHR). These EHR systems or in general healthcare IT-systems are usually developed in a static manner according to a given workflow. Hence, they are not flexible enough to enable access to EHR data and to execute individual actions within a consultation. This paper reports on requirements pointed by experts in the domain of diabetes mellitus to design a system for supporting dynamic workflows to serve personalization within a medical activity. Requirements were collected by means of expert interviews. These interviews completed a conducted triangulation approach, aimed to gather requirements for workflow-based EHR interactions. The data from the interviews was analyzed through a qualitative approach resulting in a set of requirements enhancing EHR functionality from the user's perspective. Requirements were classified according to four different categorizations: (1) process-related requirements, (2) information needs, (3) required functions, (4) non-functional requirements. Workflow related requirements were identified which should be considered when developing and deploying EHR systems.

  14. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  15. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Big Data Challenges in Global Seismic 'Adjoint Tomography' (Invited)

    NASA Astrophysics Data System (ADS)

    Tromp, J.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Smith, J.

    2013-12-01

    The challenge of imaging Earth's interior on a global scale is closely linked to the challenge of handling large data sets. The related iterative workflow involves five distinct phases, namely, 1) data gathering and culling, 2) synthetic seismogram calculations, 3) pre-processing (time-series analysis and time-window selection), 4) data assimilation and adjoint calculations, 5) post-processing (pre-conditioning, regularization, model update). In order to implement this workflow on modern high-performance computing systems, a new seismic data format is being developed. The Adaptable Seismic Data Format (ASDF) is designed to replace currently used data formats with a more flexible format that allows for fast parallel I/O. The metadata is divided into abstract categories, such as "source" and "receiver", along with provenance information for complete reproducibility. The structure of ASDF is designed keeping in mind three distinct applications: earthquake seismology, seismic interferometry, and exploration seismology. Existing time-series analysis tool kits, such as SAC and ObsPy, can be easily interfaced with ASDF so that seismologists can use robust, previously developed software packages. ASDF accommodates an automated, efficient workflow for global adjoint tomography. Manually managing the large number of simulations associated with the workflow can rapidly become a burden, especially with increasing numbers of earthquakes and stations. Therefore, it is of importance to investigate the possibility of automating the entire workflow. Scientific Workflow Management Software (SWfMS) allows users to execute workflows almost routinely. SWfMS provides additional advantages. In particular, it is possible to group independent simulations in a single job to fit the available computational resources. They also give a basic level of fault resilience as the workflow can be resumed at the correct state preceding a failure. Some of the best candidates for our particular workflow are Kepler and Swift, and the latter appears to be the most serious candidate for a large-scale workflow on a single supercomputer, remaining sufficiently simple to accommodate further modifications and improvements.

  17. Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

    PubMed

    Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

    2016-01-01

    In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.

  18. The radiologist's workflow environment: evaluation of disruptors and potential implications.

    PubMed

    Yu, John-Paul J; Kansagra, Akash P; Mongan, John

    2014-06-01

    Workflow interruptions in the health care delivery environment are a major contributor to medical errors and have been extensively studied within numerous hospital settings, including the nursing environment and the operating room, along with their effects on physician workflow. Less understood, though, is the role of interruptions in other highly specialized clinical domains and subspecialty services, such as diagnostic radiology. The workflow of the on-call radiologist, in particular, is especially susceptible to disruption by telephone calls and other modes of physician-to-physician communication. Herein, the authors describe their initial efforts to quantify the degree of interruption experienced by on-call radiologists and examine its potential implications in patient safety and overall clinical care. Copyright © 2014 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  19. High Throughput Transcriptomics: From screening to pathways

    EPA Science Inventory

    The EPA ToxCast effort has screened thousands of chemicals across hundreds of high-throughput in vitro screening assays. The project is now leveraging high-throughput transcriptomic (HTTr) technologies to substantially expand its coverage of biological pathways. The first HTTr sc...

  20. Quantitative description on structure-property relationships of Li-ion battery materials for high-throughput computations

    NASA Astrophysics Data System (ADS)

    Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun

    2017-12-01

    Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.

  1. Quantitative description on structure-property relationships of Li-ion battery materials for high-throughput computations.

    PubMed

    Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun

    2017-01-01

    Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.

  2. High-throughput Isolation and Characterization of Untagged Membrane Protein Complexes: Outer Membrane Complexes of Desulfovibrio vulgaris

    PubMed Central

    2012-01-01

    Cell membranes represent the “front line” of cellular defense and the interface between a cell and its environment. To determine the range of proteins and protein complexes that are present in the cell membranes of a target organism, we have utilized a “tagless” process for the system-wide isolation and identification of native membrane protein complexes. As an initial subject for study, we have chosen the Gram-negative sulfate-reducing bacterium Desulfovibrio vulgaris. With this tagless methodology, we have identified about two-thirds of the outer membrane- associated proteins anticipated. Approximately three-fourths of these appear to form homomeric complexes. Statistical and machine-learning methods used to analyze data compiled over multiple experiments revealed networks of additional protein–protein interactions providing insight into heteromeric contacts made between proteins across this region of the cell. Taken together, these results establish a D. vulgaris outer membrane protein data set that will be essential for the detection and characterization of environment-driven changes in the outer membrane proteome and in the modeling of stress response pathways. The workflow utilized here should be effective for the global characterization of membrane protein complexes in a wide range of organisms. PMID:23098413

  3. Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

    PubMed

    Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

    2013-06-01

    High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Christopher H.; Long, Hai; Sides, Scott

    2015-10-15

    Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less

  5. Diagnostic Applications of Next Generation Sequencing in Immunogenetics and Molecular Oncology

    PubMed Central

    Grumbt, Barbara; Eck, Sebastian H.; Hinrichsen, Tanja; Hirv, Kaimo

    2013-01-01

    Summary With the introduction of the next generation sequencing (NGS) technologies, remarkable new diagnostic applications have been established in daily routine. Implementation of NGS is challenging in clinical diagnostics, but definite advantages and new diagnostic possibilities make the switch to the technology inevitable. In addition to the higher sequencing capacity, clonal sequencing of single molecules, multiplexing of samples, higher diagnostic sensitivity, workflow miniaturization, and cost benefits are some of the valuable features of the technology. After the recent advances, NGS emerged as a proven alternative for classical Sanger sequencing in the typing of human leukocyte antigens (HLA). By virtue of the clonal amplification of single DNA molecules ambiguous typing results can be avoided. Simultaneously, a higher sample throughput can be achieved by tagging of DNA molecules with multiplex identifiers and pooling of PCR products before sequencing. In our experience, up to 380 samples can be typed for HLA-A, -B, and -DRB1 in high-resolution during every sequencing run. In molecular oncology, NGS shows a markedly increased sensitivity in comparison to the conventional Sanger sequencing and is developing to the standard diagnostic tool in detection of somatic mutations in cancer cells with great impact on personalized treatment of patients. PMID:23922545

  6. Open source libraries and frameworks for biological data visualisation: a guide for developers.

    PubMed

    Wang, Rui; Perez-Riverol, Yasset; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-04-01

    Recent advances in high-throughput experimental techniques have led to an exponential increase in both the size and the complexity of the data sets commonly studied in biology. Data visualisation is increasingly used as the key to unlock this data, going from hypothesis generation to model evaluation and tool implementation. It is becoming more and more the heart of bioinformatics workflows, enabling scientists to reason and communicate more effectively. In parallel, there has been a corresponding trend towards the development of related software, which has triggered the maturation of different visualisation libraries and frameworks. For bioinformaticians, scientific programmers and software developers, the main challenge is to pick out the most fitting one(s) to create clear, meaningful and integrated data visualisation for their particular use cases. In this review, we introduce a collection of open source or free to use libraries and frameworks for creating data visualisation, covering the generation of a wide variety of charts and graphs. We will focus on software written in Java, JavaScript or Python. We truly believe this software offers the potential to turn tedious data into exciting visual stories. © 2014 The Authors. PROTEOMICS published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. A Quasi-Dynamic Approach to modelling Hydrodynamic Focusing

    NASA Astrophysics Data System (ADS)

    Kommajosula, Aditya; Xu, Songzhe; Wu, Chueh-Yu; di Carlo, Dino; Ganapathysubramanian, Baskar; ComPM Lab Team; Di Carlo Lab Collaboration

    2016-11-01

    We examine a particle's tendency at different spatial locations to shift/rotate towards the equilibrium location, by constrained simulation. Although studies in the past have used this procedure in conjunction with FSI methods to great effect, the current work in 2D explores an alternative approach by utilizing a modified trust-region-based root-finding algorithm to solve for particle position and velocities at equilibrium, using "snapshots" of finite-element solutions to the steady-state Navier-Stokes equations iteratively over a computational domain attached to the particle reference frame. Through an assortment of test cases comprising circular and non-circular particle geometries, an incorporation of stability theory as applicable to dynamical systems is demonstrated, to locate the final focusing location and velocities. The results are compared with previous experimental/numerical reports, and found to be in close agreement. A thousand-fold increase is observed in computational time for the current workflow from its transient counterpart, for an illustrative case. The current framework is formulated in 2D for 3 Degrees-of-Freedom, and will be extended to 3D. This framework potentially allows for quick, high-throughput parametric space studies of equilibrium scaling laws.

  8. Open source libraries and frameworks for biological data visualisation: A guide for developers

    PubMed Central

    Wang, Rui; Perez-Riverol, Yasset; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-01-01

    Recent advances in high-throughput experimental techniques have led to an exponential increase in both the size and the complexity of the data sets commonly studied in biology. Data visualisation is increasingly used as the key to unlock this data, going from hypothesis generation to model evaluation and tool implementation. It is becoming more and more the heart of bioinformatics workflows, enabling scientists to reason and communicate more effectively. In parallel, there has been a corresponding trend towards the development of related software, which has triggered the maturation of different visualisation libraries and frameworks. For bioinformaticians, scientific programmers and software developers, the main challenge is to pick out the most fitting one(s) to create clear, meaningful and integrated data visualisation for their particular use cases. In this review, we introduce a collection of open source or free to use libraries and frameworks for creating data visualisation, covering the generation of a wide variety of charts and graphs. We will focus on software written in Java, JavaScript or Python. We truly believe this software offers the potential to turn tedious data into exciting visual stories. PMID:25475079

  9. Data processing has major impact on the outcome of quantitative label-free LC-MS analysis.

    PubMed

    Chawade, Aakash; Sandin, Marianne; Teleman, Johan; Malmström, Johan; Levander, Fredrik

    2015-02-06

    High-throughput multiplexed protein quantification using mass spectrometry is steadily increasing in popularity, with the two major techniques being data-dependent acquisition (DDA) and targeted acquisition using selected reaction monitoring (SRM). However, both techniques involve extensive data processing, which can be performed by a multitude of different software solutions. Analysis of quantitative LC-MS/MS data is mainly performed in three major steps: processing of raw data, normalization, and statistical analysis. To evaluate the impact of data processing steps, we developed two new benchmark data sets, one each for DDA and SRM, with samples consisting of a long-range dilution series of synthetic peptides spiked in a total cell protein digest. The generated data were processed by eight different software workflows and three postprocessing steps. The results show that the choice of the raw data processing software and the postprocessing steps play an important role in the final outcome. Also, the linear dynamic range of the DDA data could be extended by an order of magnitude through feature alignment and a charge state merging algorithm proposed here. Furthermore, the benchmark data sets are made publicly available for further benchmarking and software developments.

  10. Crystal Nucleation Using Surface-Energy-Modified Glass Substrates.

    PubMed

    Nordquist, Kyle A; Schaab, Kevin M; Sha, Jierui; Bond, Andrew H

    2017-08-02

    Systematic surface energy modifications to glass substrates can induce nucleation and improve crystallization outcomes for small molecule active pharmaceutical ingredients (APIs) and proteins. A comparatively broad probe for function is presented in which various APIs, proteins, organic solvents, aqueous media, surface energy motifs, crystallization methods, form factors, and flat and convex surface energy modifications were examined. Replicate studies ( n ≥ 6) have demonstrated an average reduction in crystallization onset times of 52(4)% (alternatively 52 ± 4%) for acetylsalicylic acid from 91% isopropyl alcohol using two very different techniques: bulk cooling to 0 °C using flat surface energy modifications or microdomain cooling to 4 °C from the interior of a glass capillary having convex surface energy modifications that were immersed in the solution. For thaumatin and bovine pancreatic trypsin, a 32(2)% reduction in crystallization onset times was demonstrated in vapor diffusion experiments ( n ≥ 15). Nucleation site arrays have been engineered onto form factors frequently used in crystallization screening, including microscope slides, vials, and 96- and 384-well high-throughput screening plates. Nucleation using surface energy modifications on the vessels that contain the solutes to be crystallized adds a layer of useful variables to crystallization studies without requiring significant changes to workflows or instrumentation.

  11. SHIWA Services for Workflow Creation and Sharing in Hydrometeorolog

    NASA Astrophysics Data System (ADS)

    Terstyanszky, Gabor; Kiss, Tamas; Kacsuk, Peter; Sipos, Gergely

    2014-05-01

    Researchers want to run scientific experiments on Distributed Computing Infrastructures (DCI) to access large pools of resources and services. To run these experiments requires specific expertise that they may not have. Workflows can hide resources and services as a virtualisation layer providing a user interface that researchers can use. There are many scientific workflow systems but they are not interoperable. To learn a workflow system and create workflows may require significant efforts. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows developed in other workflow systems. To overcome it requires creating workflow interoperability solutions to allow workflow sharing. The FP7 'Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs' (SHIWA) project developed the Coarse-Grained Interoperability concept (CGI). It enables recycling and sharing workflows of different workflow systems and executing them on different DCIs. SHIWA developed the SHIWA Simulation Platform (SSP) to implement the CGI concept integrating three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. The science gateway contains a portal, a submission service, a workflow repository and a proxy server to support the whole workflow life-cycle. The SHIWA Portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface using the WS-PGRADE workflow system as the host workflow system. The SHIWA Repository stores the formal description of workflows and workflow engines plus executables and data needed to execute them. It offers a wide-range of browse and search operations. To support non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA Repository. This service either invokes locally or remotely pre-deployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Currently SSP supports sharing of ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflows. Further workflow systems can be added to the simulation platform as required by research communities. The FP7 'Building a European Research Community through Interoperable Workflows and Data' (ER-flow) project disseminates the achievements of the SHIWA project to build workflow user communities across Europe. ER-flow provides application supports to research communities within (Astrophysics, Computational Chemistry, Heliophysics and Life Sciences) and beyond (Hydrometeorology and Seismology) to develop, share and run workflows through the simulation platform. The simulation platform supports four usage scenarios: creating and publishing workflows in the repository, searching and selecting workflows in the repository, executing non-native workflows and creating and running meta-workflows. The presentation will outline the CGI concept, the SHIWA Simulation Platform, the ER-flow usage scenarios and how the Hydrometeorology research community runs simulations on SSP.

  12. High Throughput Experimental Materials Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zakutayev, Andriy; Perkins, John; Schwarting, Marcus

    The mission of the High Throughput Experimental Materials Database (HTEM DB) is to enable discovery of new materials with useful properties by releasing large amounts of high-quality experimental data to public. The HTEM DB contains information about materials obtained from high-throughput experiments at the National Renewable Energy Laboratory (NREL).

  13. Decaf: Decoupled Dataflows for In Situ High-Performance Workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dreher, M.; Peterka, T.

    Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less

  14. 20180311 - High Throughput Transcriptomics: From screening to pathways (SOT 2018)

    EPA Science Inventory

    The EPA ToxCast effort has screened thousands of chemicals across hundreds of high-throughput in vitro screening assays. The project is now leveraging high-throughput transcriptomic (HTTr) technologies to substantially expand its coverage of biological pathways. The first HTTr sc...

  15. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  16. Growth-altering microbial interactions are responsive to chemical context

    PubMed Central

    2017-01-01

    Microbial interactions are ubiquitous in nature, and are equally as relevant to human wellbeing as the identities of the interacting microbes. However, microbial interactions are difficult to measure and characterize. Furthermore, there is growing evidence that they are not fixed, but dependent on environmental context. We present a novel workflow for inferring microbial interactions that integrates semi-automated image analysis with a colony stamping mechanism, with the overall effect of improving throughput and reproducibility of colony interaction assays. We apply our approach to infer interactions among bacterial species associated with the normal lung microbiome, and how those interactions are altered by the presence of benzo[a]pyrene, a carcinogenic compound found in cigarettes. We found that the presence of this single compound changed the interaction network, demonstrating that microbial interactions are indeed dynamic and responsive to local chemical context. PMID:28319121

  17. A mix-and-read drop-based in vitro two-hybrid method for screening high-affinity peptide binders

    PubMed Central

    Cui, Naiwen; Zhang, Huidan; Schneider, Nils; Tao, Ye; Asahara, Haruichi; Sun, Zhiyi; Cai, Yamei; Koehler, Stephan A.; de Greef, Tom F. A.; Abbaspourrad, Alireza; Weitz, David A.; Chong, Shaorong

    2016-01-01

    Drop-based microfluidics have recently become a novel tool by providing a stable linkage between phenotype and genotype for high throughput screening. However, use of drop-based microfluidics for screening high-affinity peptide binders has not been demonstrated due to the lack of a sensitive functional assay that can detect single DNA molecules in drops. To address this sensitivity issue, we introduced in vitro two-hybrid system (IVT2H) into microfluidic drops and developed a streamlined mix-and-read drop-IVT2H method to screen a random DNA library. Drop-IVT2H was based on the correlation between the binding affinity of two interacting protein domains and transcriptional activation of a fluorescent reporter. A DNA library encoding potential peptide binders was encapsulated with IVT2H such that single DNA molecules were distributed in individual drops. We validated drop-IVT2H by screening a three-random-residue library derived from a high-affinity MDM2 inhibitor PMI. The current drop-IVT2H platform is ideally suited for affinity screening of small-to-medium-sized libraries (103–106). It can obtain hits within a single day while consuming minimal amounts of reagents. Drop-IVT2H simplifies and accelerates the drop-based microfluidics workflow for screening random DNA libraries, and represents a novel alternative method for protein engineering and in vitro directed protein evolution. PMID:26940078

  18. High Throughput Determination of Critical Human Dosing Parameters (SOT)

    EPA Science Inventory

    High throughput toxicokinetics (HTTK) is a rapid approach that uses in vitro data to estimate TK for hundreds of environmental chemicals. Reverse dosimetry (i.e., reverse toxicokinetics or RTK) based on HTTK data converts high throughput in vitro toxicity screening (HTS) data int...

  19. High Throughput Determinations of Critical Dosing Parameters (IVIVE workshop)

    EPA Science Inventory

    High throughput toxicokinetics (HTTK) is an approach that allows for rapid estimations of TK for hundreds of environmental chemicals. HTTK-based reverse dosimetry (i.e, reverse toxicokinetics or RTK) is used in order to convert high throughput in vitro toxicity screening (HTS) da...

  20. Optimization of high-throughput nanomaterial developmental toxicity testing in zebrafish embryos

    EPA Science Inventory

    Nanomaterial (NM) developmental toxicities are largely unknown. With an extensive variety of NMs available, high-throughput screening methods may be of value for initial characterization of potential hazard. We optimized a zebrafish embryo test as an in vivo high-throughput assay...

  1. Realizing the promise of reverse phase protein arrays for clinical, translational, and basic research: a workshop report: the RPPA (Reverse Phase Protein Array) society.

    PubMed

    Akbani, Rehan; Becker, Karl-Friedrich; Carragher, Neil; Goldstein, Ted; de Koning, Leanne; Korf, Ulrike; Liotta, Lance; Mills, Gordon B; Nishizuka, Satoshi S; Pawlak, Michael; Petricoin, Emanuel F; Pollard, Harvey B; Serrels, Bryan; Zhu, Jingchun

    2014-07-01

    Reverse phase protein array (RPPA) technology introduced a miniaturized "antigen-down" or "dot-blot" immunoassay suitable for quantifying the relative, semi-quantitative or quantitative (if a well-accepted reference standard exists) abundance of total protein levels and post-translational modifications across a variety of biological samples including cultured cells, tissues, and body fluids. The recent evolution of RPPA combined with more sophisticated sample handling, optical detection, quality control, and better quality affinity reagents provides exquisite sensitivity and high sample throughput at a reasonable cost per sample. This facilitates large-scale multiplex analysis of multiple post-translational markers across samples from in vitro, preclinical, or clinical samples. The technical power of RPPA is stimulating the application and widespread adoption of RPPA methods within academic, clinical, and industrial research laboratories. Advances in RPPA technology now offer scientists the opportunity to quantify protein analytes with high precision, sensitivity, throughput, and robustness. As a result, adopters of RPPA technology have recognized critical success factors for useful and maximum exploitation of RPPA technologies, including the following: preservation and optimization of pre-analytical sample quality, application of validated high-affinity and specific antibody (or other protein affinity) detection reagents, dedicated informatics solutions to ensure accurate and robust quantification of protein analytes, and quality-assured procedures and data analysis workflows compatible with application within regulated clinical environments. In 2011, 2012, and 2013, the first three Global RPPA workshops were held in the United States, Europe, and Japan, respectively. These workshops provided an opportunity for RPPA laboratories, vendors, and users to share and discuss results, the latest technology platforms, best practices, and future challenges and opportunities. The outcomes of the workshops included a number of key opportunities to advance the RPPA field and provide added benefit to existing and future participants in the RPPA research community. The purpose of this report is to share and disseminate, as a community, current knowledge and future directions of the RPPA technology. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  2. CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.

    PubMed

    Velmeshev, Dmitry; Lally, Patrick; Magistri, Marco; Faghihi, Mohammad Ali

    2016-01-13

    Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.

  3. Agile parallel bioinformatics workflow management using Pwrake.

    PubMed

    Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro; Tatebe, Osamu; Yoshiura, Koh-Ichiro

    2011-09-08

    In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

  4. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows. PMID:21899774

  5. Spatial tuning of acoustofluidic pressure nodes by altering net sonic velocity enables high-throughput, efficient cell sorting

    DOE PAGES

    Jung, Seung-Yong; Notton, Timothy; Fong, Erika; ...

    2015-01-07

    Particle sorting using acoustofluidics has enormous potential but widespread adoption has been limited by complex device designs and low throughput. Here, we report high-throughput separation of particles and T lymphocytes (600 μL min -1) by altering the net sonic velocity to reposition acoustic pressure nodes in a simple two-channel device. Finally, the approach is generalizable to other microfluidic platforms for rapid, high-throughput analysis.

  6. Quantitative description on structure–property relationships of Li-ion battery materials for high-throughput computations

    PubMed Central

    Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun

    2017-01-01

    Abstract Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure–property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure–property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure–property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials. PMID:28458737

  7. SHERPA: an image segmentation and outline feature extraction tool for diatoms and other objects

    PubMed Central

    2014-01-01

    Background Light microscopic analysis of diatom frustules is widely used both in basic and applied research, notably taxonomy, morphometrics, water quality monitoring and paleo-environmental studies. In these applications, usually large numbers of frustules need to be identified and/or measured. Although there is a need for automation in these applications, and image processing and analysis methods supporting these tasks have previously been developed, they did not become widespread in diatom analysis. While methodological reports for a wide variety of methods for image segmentation, diatom identification and feature extraction are available, no single implementation combining a subset of these into a readily applicable workflow accessible to diatomists exists. Results The newly developed tool SHERPA offers a versatile image processing workflow focused on the identification and measurement of object outlines, handling all steps from image segmentation over object identification to feature extraction, and providing interactive functions for reviewing and revising results. Special attention was given to ease of use, applicability to a broad range of data and problems, and supporting high throughput analyses with minimal manual intervention. Conclusions Tested with several diatom datasets from different sources and of various compositions, SHERPA proved its ability to successfully analyze large amounts of diatom micrographs depicting a broad range of species. SHERPA is unique in combining the following features: application of multiple segmentation methods and selection of the one giving the best result for each individual object; identification of shapes of interest based on outline matching against a template library; quality scoring and ranking of resulting outlines supporting quick quality checking; extraction of a wide range of outline shape descriptors widely used in diatom studies and elsewhere; minimizing the need for, but enabling manual quality control and corrections. Although primarily developed for analyzing images of diatom valves originating from automated microscopy, SHERPA can also be useful for other object detection, segmentation and outline-based identification problems. PMID:24964954

  8. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data

    PubMed Central

    Zhang, Chao; Bijlard, Jochem; Staiger, Christine; Scollen, Serena; van Enckevort, David; Hoogstrate, Youri; Senf, Alexander; Hiltemann, Saskia; Repo, Susanna; Pipping, Wibo; Bierkens, Mariska; Payralbe, Stefan; Stringer, Bas; Heringa, Jaap; Stubbs, Andrew; Bonino Da Silva Santos, Luiz Olavo; Belien, Jeroen; Weistra, Ward; Azevedo, Rita; van Bochove, Kees; Meijer, Gerrit; Boiten, Jan-Willem; Rambla, Jordi; Fijneman, Remond; Spalding, J. Dylan; Abeln, Sanne

    2017-01-01

    The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data. PMID:29123641

  9. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data.

    PubMed

    Zhang, Chao; Bijlard, Jochem; Staiger, Christine; Scollen, Serena; van Enckevort, David; Hoogstrate, Youri; Senf, Alexander; Hiltemann, Saskia; Repo, Susanna; Pipping, Wibo; Bierkens, Mariska; Payralbe, Stefan; Stringer, Bas; Heringa, Jaap; Stubbs, Andrew; Bonino Da Silva Santos, Luiz Olavo; Belien, Jeroen; Weistra, Ward; Azevedo, Rita; van Bochove, Kees; Meijer, Gerrit; Boiten, Jan-Willem; Rambla, Jordi; Fijneman, Remond; Spalding, J Dylan; Abeln, Sanne

    2017-01-01

    The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.

  10. SHERPA: an image segmentation and outline feature extraction tool for diatoms and other objects.

    PubMed

    Kloster, Michael; Kauer, Gerhard; Beszteri, Bánk

    2014-06-25

    Light microscopic analysis of diatom frustules is widely used both in basic and applied research, notably taxonomy, morphometrics, water quality monitoring and paleo-environmental studies. In these applications, usually large numbers of frustules need to be identified and/or measured. Although there is a need for automation in these applications, and image processing and analysis methods supporting these tasks have previously been developed, they did not become widespread in diatom analysis. While methodological reports for a wide variety of methods for image segmentation, diatom identification and feature extraction are available, no single implementation combining a subset of these into a readily applicable workflow accessible to diatomists exists. The newly developed tool SHERPA offers a versatile image processing workflow focused on the identification and measurement of object outlines, handling all steps from image segmentation over object identification to feature extraction, and providing interactive functions for reviewing and revising results. Special attention was given to ease of use, applicability to a broad range of data and problems, and supporting high throughput analyses with minimal manual intervention. Tested with several diatom datasets from different sources and of various compositions, SHERPA proved its ability to successfully analyze large amounts of diatom micrographs depicting a broad range of species. SHERPA is unique in combining the following features: application of multiple segmentation methods and selection of the one giving the best result for each individual object; identification of shapes of interest based on outline matching against a template library; quality scoring and ranking of resulting outlines supporting quick quality checking; extraction of a wide range of outline shape descriptors widely used in diatom studies and elsewhere; minimizing the need for, but enabling manual quality control and corrections. Although primarily developed for analyzing images of diatom valves originating from automated microscopy, SHERPA can also be useful for other object detection, segmentation and outline-based identification problems.

  11. High-throughput screening (HTS) and modeling of the retinoid ...

    EPA Pesticide Factsheets

    Presentation at the Retinoids Review 2nd workshop in Brussels, Belgium on the application of high throughput screening and model to the retinoid system Presentation at the Retinoids Review 2nd workshop in Brussels, Belgium on the application of high throughput screening and model to the retinoid system

  12. Evaluating High Throughput Toxicokinetics and Toxicodynamics for IVIVE (WC10)

    EPA Science Inventory

    High-throughput screening (HTS) generates in vitro data for characterizing potential chemical hazard. TK models are needed to allow in vitro to in vivo extrapolation (IVIVE) to real world situations. The U.S. EPA has created a public tool (R package “httk” for high throughput tox...

  13. High-throughput RAD-SNP genotyping for characterization of sugar beet genotypes

    USDA-ARS?s Scientific Manuscript database

    High-throughput SNP genotyping provides a rapid way of developing resourceful set of markers for delineating the genetic architecture and for effective species discrimination. In the presented research, we demonstrate a set of 192 SNPs for effective genotyping in sugar beet using high-throughput mar...

  14. Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays (SOT)

    EPA Science Inventory

    Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays DE DeGroot, RS Thomas, and SO SimmonsNational Center for Computational Toxicology, US EPA, Research Triangle Park, NC USAThe EPA’s ToxCast program utilizes a wide variety of high-throughput s...

  15. A quantitative literature-curated gold standard for kinase-substrate pairs

    PubMed Central

    2011-01-01

    We describe the Yeast Kinase Interaction Database (KID, http://www.moseslab.csb.utoronto.ca/KID/), which contains high- and low-throughput data relevant to phosphorylation events. KID includes 6,225 low-throughput and 21,990 high-throughput interactions, from greater than 35,000 experiments. By quantitatively integrating these data, we identified 517 high-confidence kinase-substrate pairs that we consider a gold standard. We show that this gold standard can be used to assess published high-throughput datasets, suggesting that it will enable similar rigorous assessments in the future. PMID:21492431

  16. Outlook for Development of High-throughput Cryopreservation for Small-bodied Biomedical Model Fishes★

    PubMed Central

    Tiersch, Terrence R.; Yang, Huiping; Hu, E.

    2011-01-01

    With the development of genomic research technologies, comparative genome studies among vertebrate species are becoming commonplace for human biomedical research. Fish offer unlimited versatility for biomedical research. Extensive studies are done using these fish models, yielding tens of thousands of specific strains and lines, and the number is increasing every day. Thus, high-throughput sperm cryopreservation is urgently needed to preserve these genetic resources. Although high-throughput processing has been widely applied for sperm cryopreservation in livestock for decades, application in biomedical model fishes is still in the concept-development stage because of the limited sample volumes and the biological characteristics of fish sperm. High-throughput processing in livestock was developed based on advances made in the laboratory and was scaled up for increased processing speed, capability for mass production, and uniformity and quality assurance. Cryopreserved germplasm combined with high-throughput processing constitutes an independent industry encompassing animal breeding, preservation of genetic diversity, and medical research. Currently, there is no specifically engineered system available for high-throughput of cryopreserved germplasm for aquatic species. This review is to discuss the concepts and needs for high-throughput technology for model fishes, propose approaches for technical development, and overview future directions of this approach. PMID:21440666

  17. Web-video-mining-supported workflow modeling for laparoscopic surgeries.

    PubMed

    Liu, Rui; Zhang, Xiaoli; Zhang, Hao

    2016-11-01

    As quality assurance is of strong concern in advanced surgeries, intelligent surgical systems are expected to have knowledge such as the knowledge of the surgical workflow model (SWM) to support their intuitive cooperation with surgeons. For generating a robust and reliable SWM, a large amount of training data is required. However, training data collected by physically recording surgery operations is often limited and data collection is time-consuming and labor-intensive, severely influencing knowledge scalability of the surgical systems. The objective of this research is to solve the knowledge scalability problem in surgical workflow modeling with a low cost and labor efficient way. A novel web-video-mining-supported surgical workflow modeling (webSWM) method is developed. A novel video quality analysis method based on topic analysis and sentiment analysis techniques is developed to select high-quality videos from abundant and noisy web videos. A statistical learning method is then used to build the workflow model based on the selected videos. To test the effectiveness of the webSWM method, 250 web videos were mined to generate a surgical workflow for the robotic cholecystectomy surgery. The generated workflow was evaluated by 4 web-retrieved videos and 4 operation-room-recorded videos, respectively. The evaluation results (video selection consistency n-index ≥0.60; surgical workflow matching degree ≥0.84) proved the effectiveness of the webSWM method in generating robust and reliable SWM knowledge by mining web videos. With the webSWM method, abundant web videos were selected and a reliable SWM was modeled in a short time with low labor cost. Satisfied performances in mining web videos and learning surgery-related knowledge show that the webSWM method is promising in scaling knowledge for intelligent surgical systems. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Widening the adoption of workflows to include human and human-machine scientific processes

    NASA Astrophysics Data System (ADS)

    Salayandia, L.; Pinheiro da Silva, P.; Gates, A. Q.

    2010-12-01

    Scientific workflows capture knowledge in the form of technical recipes to access and manipulate data that help scientists manage and reuse established expertise to conduct their work. Libraries of scientific workflows are being created in particular fields, e.g., Bioinformatics, where combined with cyber-infrastructure environments that provide on-demand access to data and tools, result in powerful workbenches for scientists of those communities. The focus in these particular fields, however, has been more on automating rather than documenting scientific processes. As a result, technical barriers have impeded a wider adoption of scientific workflows by scientific communities that do not rely as heavily on cyber-infrastructure and computing environments. Semantic Abstract Workflows (SAWs) are introduced to widen the applicability of workflows as a tool to document scientific recipes or processes. SAWs intend to capture a scientists’ perspective about the process of how she or he would collect, filter, curate, and manipulate data to create the artifacts that are relevant to her/his work. In contrast, scientific workflows describe the process from the point of view of how technical methods and tools are used to conduct the work. By focusing on a higher level of abstraction that is closer to a scientist’s understanding, SAWs effectively capture the controlled vocabularies that reflect a particular scientific community, as well as the types of datasets and methods used in a particular domain. From there on, SAWs provide the flexibility to adapt to different environments to carry out the recipes or processes. These environments range from manual fieldwork to highly technical cyber-infrastructure environments, i.e., such as those already supported by scientific workflows. Two cases, one from Environmental Science and another from Geophysics, are presented as illustrative examples.

  19. A Proposed Set of Metrics to Reduce Patient Safety Risk From Within the Anatomic Pathology Laboratory

    PubMed Central

    Banks, Peter; Brown, Richard; Laslowski, Alex; Daniels, Yvonne; Branton, Phil; Carpenter, John; Zarbo, Richard; Forsyth, Ramses; Liu, Yan-hui; Kohl, Shane; Diebold, Joachim; Masuda, Shinobu; Plummer, Tim

    2017-01-01

    Background: Anatomic pathology laboratory workflow consists of 3 major specimen handling processes. Among the workflow are preanalytic, analytic, and postanalytic phases that contain multistep subprocesses with great impact on patient care. A worldwide representation of experts came together to create a system of metrics, as a basis for laboratories worldwide, to help them evaluate and improve specimen handling to reduce patient safety risk. Method: Members of the Initiative for Anatomic Pathology Laboratory Patient Safety (IAPLPS) pooled their extensive expertise to generate a list of metrics highlighting processes with high and low risk for adverse patient outcomes. Results: Our group developed a universal, comprehensive list of 47 metrics for patient specimen handling in the anatomic pathology laboratory. Steps within the specimen workflow sequence are categorized as high or low risk. In general, steps associated with the potential for specimen misidentification correspond to the high-risk grouping and merit greater focus within quality management systems. Primarily workflow measures related to operational efficiency can be considered low risk. Conclusion: Our group intends to advance the widespread use of these metrics in anatomic pathology laboratories to reduce patient safety risk and improve patient care with development of best practices and interlaboratory error reporting programs. PMID:28340232

  20. A Proposed Set of Metrics to Reduce Patient Safety Risk From Within the Anatomic Pathology Laboratory.

    PubMed

    Banks, Peter; Brown, Richard; Laslowski, Alex; Daniels, Yvonne; Branton, Phil; Carpenter, John; Zarbo, Richard; Forsyth, Ramses; Liu, Yan-Hui; Kohl, Shane; Diebold, Joachim; Masuda, Shinobu; Plummer, Tim; Dennis, Eslie

    2017-05-01

    Anatomic pathology laboratory workflow consists of 3 major specimen handling processes. Among the workflow are preanalytic, analytic, and postanalytic phases that contain multistep subprocesses with great impact on patient care. A worldwide representation of experts came together to create a system of metrics, as a basis for laboratories worldwide, to help them evaluate and improve specimen handling to reduce patient safety risk. Members of the Initiative for Anatomic Pathology Laboratory Patient Safety (IAPLPS) pooled their extensive expertise to generate a list of metrics highlighting processes with high and low risk for adverse patient outcomes. : Our group developed a universal, comprehensive list of 47 metrics for patient specimen handling in the anatomic pathology laboratory. Steps within the specimen workflow sequence are categorized as high or low risk. In general, steps associated with the potential for specimen misidentification correspond to the high-risk grouping and merit greater focus within quality management systems. Primarily workflow measures related to operational efficiency can be considered low risk. Our group intends to advance the widespread use of these metrics in anatomic pathology laboratories to reduce patient safety risk and improve patient care with development of best practices and interlaboratory error reporting programs. © American Society for Clinical Pathology 2017.

  1. An access control model with high security for distributed workflow and real-time application

    NASA Astrophysics Data System (ADS)

    Han, Ruo-Fei; Wang, Hou-Xiang

    2007-11-01

    The traditional mandatory access control policy (MAC) is regarded as a policy with strict regulation and poor flexibility. The security policy of MAC is so compelling that few information systems would adopt it at the cost of facility, except some particular cases with high security requirement as military or government application. However, with the increasing requirement for flexibility, even some access control systems in military application have switched to role-based access control (RBAC) which is well known as flexible. Though RBAC can meet the demands for flexibility but it is weak in dynamic authorization and consequently can not fit well in the workflow management systems. The task-role-based access control (T-RBAC) is then introduced to solve the problem. It combines both the advantages of RBAC and task-based access control (TBAC) which uses task to manage permissions dynamically. To satisfy the requirement of system which is distributed, well defined with workflow process and critically for time accuracy, this paper will analyze the spirit of MAC, introduce it into the improved T&RBAC model which is based on T-RBAC. At last, a conceptual task-role-based access control model with high security for distributed workflow and real-time application (A_T&RBAC) is built, and its performance is simply analyzed.

  2. High-throughput measurements of biochemical responses using the plate::vision multimode 96 minilens array reader.

    PubMed

    Huang, Kuo-Sen; Mark, David; Gandenberger, Frank Ulrich

    2006-01-01

    The plate::vision is a high-throughput multimode reader capable of reading absorbance, fluorescence, fluorescence polarization, time-resolved fluorescence, and luminescence. Its performance has been shown to be quite comparable with other readers. When the reader is integrated into the plate::explorer, an ultrahigh-throughput screening system with event-driven software and parallel plate-handling devices, it becomes possible to run complicated assays with kinetic readouts in high-density microtiter plate formats for high-throughput screening. For the past 5 years, we have used the plate::vision and the plate::explorer to run screens and have generated more than 30 million data points. Their throughput, performance, and robustness have speeded up our drug discovery process greatly.

  3. j5 v2.8.4

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hillson, Nathan

    j5 automates and optimizes the design of the molecular biological process of cloning/constructing DNA. j5 enables users to benefit from (combinatorial) multi-part scar-less SLIC, Gibson, CPEC, Golden Gate assembly, or variants thereof, for which automation software does not currently exist, without the intense labor currently associated with the process. j5 inputs a list of the DNA sequences to be assembled, along with a Genbank, FASTA, jbei-seq, or SBOL v1.1 format sequence file for each DNA source. Given the list of DNA sequences to be assembled, j5 first determines the cost-minimizing assembly strategy for each part (direct synthesis, PCR/SOE, or oligo-embedding),more » designs DNA oligos with Primer3, adds flanking homology sequences (SLIC, Gibson, and CPEC; optimized with Primer3 for CPEC) or optimized overhang sequences (Golden Gate) to the oligos and direct synthesis pieces, and utilizes BLAST to check against oligo mis-priming and assembly piece incompatibility events. After identifying DNA oligos that are already contained within a local collection for reuse, the program estimates the total cost of direct synthesis and new oligos to be ordered. In the instance that j5 identifies putative assembly piece incompatibilities (multiple pieces with high flanking sequence homology), the program suggests hierarchical subassemblies where possible. The program outputs a comma-separated value (CSV) file, viewable via Excel or other spreadsheet software, that contains assembly design information (such as the PCR/SOE reactions to perform, their anticipated sizes and sequences, etc.) as well as a properly annotated genbank file containing the sequence resulting from the assembly, and appends the local oligo library with the oligos to be ordered j5 condenses multiple independent assembly projects into 96-well format for high-throughput liquid-handling robotics platforms, and generates configuration files for the PR-PR biology-friendly robot programming language. j5 thus provides a new way to design DNA assembly procedures much more productively and efficiently, not only in terms of time, but also in terms of cost. To a large extent, however, j5 does not allow people to do something that could not be done before by hand given enough time and effort. An exception to this is that, since the very act of using j5 to design the DNA assembly process standardizes the experimental details and workflow, j5 enables a single person to concurrently perform the independent DNA construction tasks of an entire group of researchers. Currently, this is not readily possible, since separate researchers employ disparate design strategies and workflows, and furthermore, their designs and workflows are very infrequently fully captured in an electronic format which is conducive to automation.« less

  4. Combining transrectal ultrasound and CT for image-guided adaptive brachytherapy of cervical cancer: Proof of concept.

    PubMed

    Nesvacil, Nicole; Schmid, Maximilian P; Pötter, Richard; Kronreif, Gernot; Kirisits, Christian

    To investigate the feasibility of a treatment planning workflow for three-dimensional image-guided cervix cancer brachytherapy, combining volumetric transrectal ultrasound (TRUS) for target definition with CT for dose optimization to organs at risk (OARs), for settings with no access to MRI. A workflow for TRUS/CT-based volumetric treatment planning was developed, based on a customized system including ultrasound probe, stepper unit, and software for image volume acquisition. A full TRUS/CT-based workflow was simulated in a clinical case and compared with MR- or CT-only delineation. High-risk clinical target volume was delineated on TRUS, and OARs were delineated on CT. Manually defined tandem/ring applicator positions on TRUS and CT were used as a reference for rigid registration of the image volumes. Treatment plan optimization for TRUS target and CT organ volumes was performed and compared to MRI and CT target contours. TRUS/CT-based contouring, applicator reconstruction, image fusion, and treatment planning were feasible, and the full workflow could be successfully demonstrated. The TRUS/CT plan fulfilled all clinical planning aims. Dose-volume histogram evaluation of the TRUS/CT-optimized plan (high-risk clinical target volume D 90 , OARs D 2cm³ for) on different image modalities showed good agreement between dose values reported for TRUS/CT and MRI-only reference contours and large deviations for CT-only target parameters. A TRUS/CT-based workflow for full three-dimensional image-guided cervix brachytherapy treatment planning seems feasible and may be clinically comparable to MRI-based treatment planning. Further development to solve challenges with applicator definition in the TRUS volume is required before systematic applicability of this workflow. Copyright © 2016 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.

  5. It's All About the Data: Workflow Systems and Weather

    NASA Astrophysics Data System (ADS)

    Plale, B.

    2009-05-01

    Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.

  6. TCP Throughput Profiles Using Measurements over Dedicated Connections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata

    Wide-area data transfers in high-performance computing infrastructures are increasingly being carried over dynamically provisioned dedicated network connections that provide high capacities with no competing traffic. We present extensive TCP throughput measurements and time traces over a suite of physical and emulated 10 Gbps connections with 0-366 ms round-trip times (RTTs). Contrary to the general expectation, they show significant statistical and temporal variations, in addition to the overall dependencies on the congestion control mechanism, buffer size, and the number of parallel streams. We analyze several throughput profiles that have highly desirable concave regions wherein the throughput decreases slowly with RTTs, inmore » stark contrast to the convex profiles predicted by various TCP analytical models. We present a generic throughput model that abstracts the ramp-up and sustainment phases of TCP flows, which provides insights into qualitative trends observed in measurements across TCP variants: (i) slow-start followed by well-sustained throughput leads to concave regions; (ii) large buffers and multiple parallel streams expand the concave regions in addition to improving the throughput; and (iii) stable throughput dynamics, indicated by a smoother Poincare map and smaller Lyapunov exponents, lead to wider concave regions. These measurements and analytical results together enable us to select a TCP variant and its parameters for a given connection to achieve high throughput with statistical guarantees.« less

  7. Enhancing high throughput toxicology - development of putative adverse outcome pathways linking US EPA ToxCast screening targets to relevant apical hazards.

    EPA Science Inventory

    High throughput toxicology programs, such as ToxCast and Tox21, have provided biological effects data for thousands of chemicals at multiple concentrations. Compared to traditional, whole-organism approaches, high throughput assays are rapid and cost-effective, yet they generall...

  8. Evaluation of High-Throughput Chemical Exposure Models via Analysis of Matched Environmental and Biological Media Measurements

    EPA Science Inventory

    The U.S. EPA, under its ExpoCast program, is developing high-throughput near-field modeling methods to estimate human chemical exposure and to provide real-world context to high-throughput screening (HTS) hazard data. These novel modeling methods include reverse methods to infer ...

  9. The development of a general purpose ARM-based processing unit for the ATLAS TileCal sROD

    NASA Astrophysics Data System (ADS)

    Cox, M. A.; Reed, R.; Mellado, B.

    2015-01-01

    After Phase-II upgrades in 2022, the data output from the LHC ATLAS Tile Calorimeter will increase significantly. ARM processors are common in mobile devices due to their low cost, low energy consumption and high performance. It is proposed that a cost-effective, high data throughput Processing Unit (PU) can be developed by using several consumer ARM processors in a cluster configuration to allow aggregated processing performance and data throughput while maintaining minimal software design difficulty for the end-user. This PU could be used for a variety of high-level functions on the high-throughput raw data such as spectral analysis and histograms to detect possible issues in the detector at a low level. High-throughput I/O interfaces are not typical in consumer ARM System on Chips but high data throughput capabilities are feasible via the novel use of PCI-Express as the I/O interface to the ARM processors. An overview of the PU is given and the results for performance and throughput testing of four different ARM Cortex System on Chips are presented.

  10. A patient workflow management system built on guidelines.

    PubMed Central

    Dazzi, L.; Fassino, C.; Saracco, R.; Quaglini, S.; Stefanelli, M.

    1997-01-01

    To provide high quality, shared, and distributed medical care, clinical and organizational issues need to be integrated. This work describes a methodology for developing a Patient Workflow Management System, based on a detailed model of both the medical work process and the organizational structure. We assume that the medical work process is represented through clinical practice guidelines, and that an ontological description of the organization is available. Thus, we developed tools 1) for acquiring the medical knowledge contained into a guideline, 2) to translate the derived formalized guideline into a computational formalism, precisely a Petri Net, 3) to maintain different representation levels. The high level representation guarantees that the Patient Workflow follows the guideline prescriptions, while the low level takes into account the specific organization characteristics and allow allocating resources for managing a specific patient in daily practice. PMID:9357606

  11. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  12. On-line characterization of monoclonal antibody variants by liquid chromatography-mass spectrometry operating in a two-dimensional format.

    PubMed

    Alvarez, Melissa; Tremintin, Guillaume; Wang, Jennifer; Eng, Marian; Kao, Yung-Hsiang; Jeong, Justin; Ling, Victor T; Borisov, Oleg V

    2011-12-01

    Recombinant monoclonal antibodies (MAbs) have become one of the most rapidly growing classes of biotherapeutics in the treatment of human disease. MAbs are highly heterogeneous proteins, thereby requiring a battery of analytical technologies for their characterization. However, incompatibility between separation and subsequent detection is often encountered. Here we demonstrate the utility of a generic on-line liquid chromatography-mass spectrometry (LC-MS) method operated in a two-dimensional format toward the rapid characterization of MAb charge and size variants. Using a single chromatographic system capable of running two independent gradients, up to six fractions of interest from an ion exchange (IEC) or size exclusion (SEC) separation can be identified by trapping and desalting the fractions onto a series of reversed phase trap cartridges with subsequent on-line analysis by mass spectrometry. Analysis of poorly resolved and low-level peaks in the IEC or SEC profile was facilitated by preconcentrating fractions on the traps using multiple injections. An on-line disulfide reduction step was successfully incorporated into the workflow, allowing more detailed characterization of modified MAbs by providing chain-specific information. The system is fully automated, thereby enabling high-throughput analysis with minimal sample handling. This technology provides rapid data turnaround time, a much needed feature during product characterization and development of multiple biotherapeutic proteins. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. OpenMS - A platform for reproducible analysis of mass spectrometry data.

    PubMed

    Pfeuffer, Julianus; Sachsenberg, Timo; Alka, Oliver; Walzer, Mathias; Fillbrunn, Alexander; Nilse, Lars; Schilling, Oliver; Reinert, Knut; Kohlbacher, Oliver

    2017-11-10

    In recent years, several mass spectrometry-based omics technologies emerged to investigate qualitative and quantitative changes within thousands of biologically active components such as proteins, lipids and metabolites. The research enabled through these methods potentially contributes to the diagnosis and pathophysiology of human diseases as well as to the clarification of structures and interactions between biomolecules. Simultaneously, technological advances in the field of mass spectrometry leading to an ever increasing amount of data, demand high standards in efficiency, accuracy and reproducibility of potential analysis software. This article presents the current state and ongoing developments in OpenMS, a versatile open-source framework aimed at enabling reproducible analyses of high-throughput mass spectrometry data. It provides implementations of frequently occurring processing operations on MS data through a clean application programming interface in C++ and Python. A collection of 185 tools and ready-made workflows for typical MS-based experiments enable convenient analyses for non-developers and facilitate reproducible research without losing flexibility. OpenMS will continue to increase its ease of use for developers as well as users with improved continuous integration/deployment strategies, regular trainings with updated training materials and multiple sources of support. The active developer community ensures the incorporation of new features to support state of the art research. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  14. LipidFrag: Improving reliability of in silico fragmentation of lipids and application to the Caenorhabditis elegans lipidome

    PubMed Central

    Neumann, Steffen; Schmitt-Kopplin, Philippe

    2017-01-01

    Lipid identification is a major bottleneck in high-throughput lipidomics studies. However, tools for the analysis of lipid tandem MS spectra are rather limited. While the comparison against spectra in reference libraries is one of the preferred methods, these libraries are far from being complete. In order to improve identification rates, the in silico fragmentation tool MetFrag was combined with Lipid Maps and lipid-class specific classifiers which calculate probabilities for lipid class assignments. The resulting LipidFrag workflow was trained and evaluated on different commercially available lipid standard materials, measured with data dependent UPLC-Q-ToF-MS/MS acquisition. The automatic analysis was compared against manual MS/MS spectra interpretation. With the lipid class specific models, identification of the true positives was improved especially for cases where candidate lipids from different lipid classes had similar MetFrag scores by removing up to 56% of false positive results. This LipidFrag approach was then applied to MS/MS spectra of lipid extracts of the nematode Caenorhabditis elegans. Fragments explained by LipidFrag match known fragmentation pathways, e.g., neutral losses of lipid headgroups and fatty acid side chain fragments. Based on prediction models trained on standard lipid materials, high probabilities for correct annotations were achieved, which makes LipidFrag a good choice for automated lipid data analysis and reliability testing of lipid identifications. PMID:28278196

  15. Characterization of human plasma proteome dynamics using deuterium oxide.

    PubMed

    Wang, Ding; Liem, David A; Lau, Edward; Ng, Dominic C M; Bleakley, Brian J; Cadeiras, Martin; Deng, Mario C; Lam, Maggie P Y; Ping, Peipei

    2014-08-01

    High-throughput quantification of human protein turnover via in vivo administration of deuterium oxide ((2) H2 O) is a powerful new approach to examine potential disease mechanisms. Its immediate clinical translation is contingent upon characterizations of the safety and hemodynamic effects of in vivo administration of (2) H2 O to human subjects. We recruited ten healthy human subjects with a broad demographic variety to evaluate the safety, feasibility, efficacy, and reproducibility of (2) H2 O intake for studying protein dynamics. We designed a protocol where each subject orally consumed weight-adjusted doses of 70% (2) H2 O daily for 14 days to enrich body water and proteins with deuterium. Plasma proteome dynamics was measured using a high-resolution MS method we recently developed. This protocol was successfully applied in ten human subjects to characterize the endogenous turnover rates of 542 human plasma proteins, the largest such human dataset to-date. Throughout the study, we did not detect physiological effects or signs of discomfort from (2) H2 O consumption. Our investigation supports the utility of a (2) H2 O intake protocol that is safe, accessible, and effective for clinical investigations of large-scale human protein turnover dynamics. This workflow shows promising clinical translational value for examining plasma protein dynamics in human diseases. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Characterization of human plasma proteome dynamics using deuterium oxide

    PubMed Central

    Wang, Ding; Liem, David A; Lau, Edward; Ng, Dominic CM; Bleakley, Brian J; Cadeiras, Martin; Deng, Mario C; Lam, Maggie PY; Ping, Peipei

    2016-01-01

    Purpose High-throughput quantification of human protein turnover via in vivo administration of deuterium oxide (2H2O) is a powerful new approach to examine potential disease mechanisms. Its immediate clinical translation is contingent upon characterizations of the safety and hemodynamic effects of in vivo administration of 2H2O to human subjects. Experimental design We recruited 10 healthy human subjects with a broad demographic variety to evaluate the safety, feasibility, efficacy, and reproducibility of 2H2O intake for studying protein dynamics. We designed a protocol where each subject orally consumed weight-adjusted doses of 70% 2H2O daily for 14 days to enrich body water and proteins with deuterium. Plasma proteome dynamics was measured using a high-resolution MS method we recently developed. Results This protocol was successfully applied in 10 human subjects to characterize the endogenous turnover rates of 542 human plasma proteins, the largest such human dataset to-date. Throughout the study, we did not detect physiological effects or signs of discomfort from 2H2O consumption. Conclusions and clinical relevance Our investigation supports the utility of a 2H2O intake protocol that is safe, accessible, and effective for clinical investigations of large-scale human protein turnover dynamics. This workflow shows promising clinical translational value for examining plasma protein dynamics in human diseases. PMID:24946186

  17. Towards unsupervised polyaromatic hydrocarbons structural assignment from SA-TIMS-FTMS data.

    PubMed

    Benigni, Paolo; Marin, Rebecca; Fernandez-Lima, Francisco

    2015-10-01

    With the advent of high resolution ion mobility analyzers and their coupling to ultrahigh resolution mass spectrometers, there is a need to further develop a theoretical workflow capable of correlating experimental accurate mass and mobility measurements with tridimensional candidate structures. In the present work, a general workflow is described for unsupervised tridimensional structural assignment based on accurate mass measurements, mobility measurements, in silico 2D-3D structure generation, and theoretical mobility calculations. In particular, the potential of this workflow will be shown for the analysis of polyaromatic hydrocarbons from Coal Tar SRM 1597a using selected accumulation - trapped ion mobility spectrometry (SA-TIMS) coupled to Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR MS). The proposed workflow can be adapted to different IMS scenarios, can utilize different collisional cross-section calculators and has the potential to include MS n and IMS n measurements for faster and more accurate tridimensional structural assignment.

  18. NOW: A Workflow Language for Orchestration in Nomadic Networks

    NASA Astrophysics Data System (ADS)

    Philips, Eline; van der Straeten, Ragnhild; Jonckers, Viviane

    Existing workflow languages for nomadic or mobile ad hoc networks do not offer adequate support for dealing with the volatile connections inherent to these environments. Services residing on mobile devices are exposed to (temporary) network failures, which should be considered the rule rather than the exception. This paper proposes a nomadic workflow language built on top of an ambient-oriented programming language which supports dynamic service discovery and communication primitives resilient to network failures. Our proposed language provides high level workflow abstractions for control flow and supports rich network and service failure detection and handling through compensating actions. Moreover, we introduce a powerful variable binding mechanism which enables dynamic data flow between services in a nomadic environment. By adding this extra layer of abstraction on top of an ambient-oriented programming language, the application programmer is offered a flexible way to develop applications for nomadic networks.

  19. Reproducible Tissue Homogenization and Protein Extraction for Quantitative Proteomics Using MicroPestle-Assisted Pressure-Cycling Technology.

    PubMed

    Shao, Shiying; Guo, Tiannan; Gross, Vera; Lazarev, Alexander; Koh, Ching Chiek; Gillessen, Silke; Joerger, Markus; Jochum, Wolfram; Aebersold, Ruedi

    2016-06-03

    The reproducible and efficient extraction of proteins from biopsy samples for quantitative analysis is a critical step in biomarker and translational research. Recently, we described a method consisting of pressure-cycling technology (PCT) and sequential windowed acquisition of all theoretical fragment ions-mass spectrometry (SWATH-MS) for the rapid quantification of thousands of proteins from biopsy-size tissue samples. As an improvement of the method, we have incorporated the PCT-MicroPestle into the PCT-SWATH workflow. The PCT-MicroPestle is a novel, miniaturized, disposable mechanical tissue homogenizer that fits directly into the microTube sample container. We optimized the pressure-cycling conditions for tissue lysis with the PCT-MicroPestle and benchmarked the performance of the system against the conventional PCT-MicroCap method using mouse liver, heart, brain, and human kidney tissues as test samples. The data indicate that the digestion of the PCT-MicroPestle-extracted proteins yielded 20-40% more MS-ready peptide mass from all tissues tested with a comparable reproducibility when compared to the conventional PCT method. Subsequent SWATH-MS analysis identified a higher number of biologically informative proteins from a given sample. In conclusion, we have developed a new device that can be seamlessly integrated into the PCT-SWATH workflow, leading to increased sample throughput and improved reproducibility at both the protein extraction and proteomic analysis levels when applied to the quantitative proteomic analysis of biopsy-level samples.

  20. Autonomous driving in NMR.

    PubMed

    Perez, Manuel

    2017-01-01

    The automatic analysis of NMR data has been a much-desired endeavour for the last six decades, as it is the case with any other analytical technique. This need for automation has only grown as advances in hardware; pulse sequences and automation have opened new research areas to NMR and increased the throughput of data. Full automatic analysis is a worthy, albeit hard, challenge, but in a world of artificial intelligence, instant communication and big data, it seems that this particular fight is happening with only one technique at a time (let this be NMR, MS, IR, UV or any other), when the reality of most laboratories is that there are several types of analytical instrumentation present. Data aggregation, verification and elucidation by using complementary techniques (e.g. MS and NMR) is a desirable outcome to pursue, although a time-consuming one if performed manually; hence, the use of automation to perform the heavy lifting for users is required to make the approach attractive for scientists. Many of the decisions and workflows that could be implemented under automation will depend on the two-way communication with databases that understand analytical data, because it is desirable not only to query these databases but also to grow them in as much of an automatic manner as possible. How these databases are designed, set up and the data inside classified will determine what workflows can be implemented. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

Top